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Amplitude modulation cues for perceptual voicing distinctions in noise 
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Journal: The Journal of the Acoustical Society of America 
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Copyright (c) 1998 American Institute of Physics. All rights reserved. 



. . . vocal folds can modulate the pressure source behind the constriction 
sufficiently to generate an acoustic pitch-rate amplitude- 
modulation cue in high-frequency regions. Predictions of the 
perceptual experiment<right single quotation mark>s... 
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... synthesized by varying VOT from 10 to 45 ms in 5 ms steps. Fifteen 
different pitch contours were generated by designating FO 
targets at three points in the stimulus: initial vowel, onset... 
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Author Kacha, A.; Grenez, F,; Bettens, F.; Schoentgen, J. 

Author Affiliation: Dept. Waves & Signals, Univ. Libre de Bruxelles, Brussels, Belgium 

Conference Title: Proceedings of international Conference on Signals and Electronic Systems ICSES*04 p. 4 1 7-20 
Editor(s): Bartkowiak,M.; Domanski,M.; Grajek,T.; Stasinski,R.; Swierczynski,R.; Rosinski,T. 
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Publication Date: 2004 Country of Publication: Poland xvii+603 pp. 
ISBN: 83 906074 7 6 Material Identity Number: XX-2005-00450 

Conference Title: Proceedings of International Conference on Signals and Electronic Systems ICSES*04 
Conference Date: 13-15 Sept. 2004 Conference Location: Poznan, Poland 
Language: English 
Subfile: B 

Copyright 2005, lEE 

Abstract: ...correlates with the perceived degree of hoarseness. The coefficients of the time-varying two-sided pitch 
prediction filter are estimated adaptively by means of a recursive least squares algorithm. 
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Publication Date: 2000 Country of Publication: USA 6 vol. lxxx+3906 pp. 
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Conference Title: Proceedings of 2000 International Conference on Acoustics, Speech and Signal Processing 
Conference Sponsor: IEEE; Signal Process. Soc 

Conference Date: 5-9 June 2000 Conference Location: Istanbul, Turkey 
Language: English 
Subfile: B 

Copyright 2000, 1 EE 

Abstract: ...for techniques that reduce the perceptibility of the errors in the reconstructed signal. Pitch-synchronous 
estimation of the linear- prediction filter and pitch-synchronous updating of the adaptive codebook reduce the 
coefficient-estimation error and increase the relative contribution of the adaptive codebook component to the 
synthesized signal... 
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Title: Linear and nonlinear adaptive filtering and their application to speech intelligibility enhancement 
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University: Eindhoven Univ. Technol., Netherlands 

Publisher: Eindhoven Univ. Technol , Eindhoven, Netherlands 

Publication Date: 15 Sept. 1992 Country of Publication: Netherlands 218 pp. 

Language: English 

Subfile: B C 

Abstract: ...noise itself is interference-speech as well. For this purpose, new linear and nonlinear adaptive filtering 
techniques and robust pitch contour estimation algorithms were developed and applied to co-channel speech 
separation. These new adaptive filtering algorithms and the pitch contour estimation algorithm are suitable for other 
applications apart from speech intelligibility enhancement. 
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Title: Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input 
Author: Lee, S.W.; Soong, Frank K.; Ching, P.C. 

Corporate Source: Department of Electronic Engineering Chinese University of Hong Kong, Hong Kong, Hong Kong 
Conference Title: 9th European Conference on Speech Communication and Technology 
Conference Location: Lisbon, Portugal Conference Date: 20050904-20050908 
E.L Conference No.: 67499 

Source: 9th European Conference on Speech Communication and Technology 9th European Conference on Speech 
Communication and Technology, Eurospeech Interspeech 2005. 
Publication Year: 2005 
Language: English 

Abstract: ...the harmonic structure of voiced speech, pitch information of one source is extracted from the pitch 
prediction filter and the output residual becomes the estimate of the other source. The procedure is iterated 
successively with a summation constraint. From the evolution of pitch prediction filter, it is shown that the iterative 
harmonic filtering with the summation constraint is effective to... 
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Fulltext available through: ScicnccDirect 
Ei Compendex(R) 
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E.L Conference No.: 42612 
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IEEE, Piscataway, NJ, USA,94CH3387-8. p 189-192 

Publication Year: 1994 

CODEN: IPRODJ ISSN: 0736-7791 

Language: English 
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( Oberschwingungsschaetzung auf Grundlage der augenblicklichen Frequenz und Anwendung auf die 
Tonhoehenbestimmung von Sprache ) 
Abe, T; Kobayashi, T; Imai, S 

Precision & Intelligence Lab., Tokyo Inst, of Technol., Yokohama, Japan 
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Document type: journal article Language: English 
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Conference Title: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE Cat. No. 

06CH37812C) p. 1-73-6 

Publisher: IEEE , Piscataway, NJ, USA 

Publication Date: 2006 Country of Publication: USA CD-ROM pp. 
ISBN: 1 4244 0469 X Material Identity Number: XX-2006-00798 
U.S. Copyright Clearance Center Code: 1 -4244-0469-X/06/$20.00 

Conference Title: 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing 
Conference Date: 14-19 May 2006 Conference Location: Toulouse, France 
Language: English 
Subfile: C 

Copyright 2006, The Institution of Engineering and Technology 
Author Eide, E.M.; Picheny, M A. 

Abstract: ...to-speech system. First, we investigate the pooling of data from multiple speakers for building statistical 
models to predict pitch and duration, and present listening test results which show that the expressiveness of our 
TTS... 
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Conference Title: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP'03) 
Conference Sponsor: IEEE Signal Process, Soc 

Conference Date: 6-10 April 2003 Conference Location: Hong Kong, China 
Language: English 
Subfile: B C 

Copyright 2003, lEE 

Author Eide, E.; Aaron, A.; Bakis, R.; Cohen, R.; Donovan, R.; Hamza, W.; Mathes, T.; Picheny, M.; Polkosky, M.; 
Smith, M... 

Abstract: ...led to significant gains in the output quality. On the algorithms side, v^e have introduced statistical models 
for predicting pitch and duration targets which replace the rule-based target generation previously employed. 
Additionally, we have... 
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Abstract: ...led to significant gains in the output quality. On the algorithms side, we have introduced statistical models 
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Additionally, we have... 
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...capacitor 16, to output of inductance coil 3 adjacent to inductance coil 2. Part of energy in low-frequency filter 
where inductance coils I through 3 are not separated by screens is transferred from inductance... 
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Speech synthesizing method for human listener, involves determining smooth pitch contour between adjacent 
anchor points by linearly interpolating between pitch values of smooth pitch contour at anchor points Inventor: 
BAKIS R Alerting Abstract ...The method involves generating a sequence of phonetic units representative of a target 
utterance, and determining a pitch contour for the target utterance, where the pitch contour has a set of linear pitch 
contour segments. The pitch contour is filtered to determine pitch values of a smooth pitch contour at anchor points. 
The smooth pitch contour is determined between adjacent anchor points by linearly interpolating between the pitch 
values of the smooth pitch contour at the anchor points. ... ADVANTAGE - The method enables fast and efficient 
smoothing of discontinuous, non-smooth pitch contours obtained from concatenation of the speech segments while 
improving a quality of a synthesized signal... Original Publication Data by Authoritylnventor name & address: Bakis, 
Raimo... Original AbstractsiTTS synthesis systems are provided which implement computationally fast and efficient 
pitch contour smoothing methods for determining smooth pitch contours for non-smooth pitch contours, which 
closely track the non-smooth pitch contours. For example, a TTS method includes generating a sequence of phonetic 
units representative of a target utterance, determining a pitch contour for the target utterance, the pitch contour 
comprising a plurality of linear pitch contour segments, wherein each linear pitch contour segment has start and end 
times at anchor points of the pitch contour, filtering the pitch contour to determine pitch values of a smooth pitch 
contour at the anchor points, and determining the smooth pitch contour between adjacent anchor points by linearly 
interpolating between the pitch values of the smooth pitch contour at the anchor points. ...Claimstfor speech synthesis, 
comprising:generating a sequence of phonetic units representative of a target utterance;determining a pitch contour 
for the target utterance, the pitch contour comprising a plurality of linear pitch contour segments, wherein each linear 
pitch contour segment has start and end times at anchor points of the pitch contour;flltering the pitch contour to 
determine pitch values of a smooth pitch contour at the anchor points; anddetermining the smooth pitch contour 
between adjacent anchor points by linearly interpolating between the pitch values of the smooth pitch contour at the 
anchor points. 
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Program storage device in text-to-speech conversion system, stores instructions for automatically generating 
marked-up text corresponding to spoken utterance using prosodic parameters such as pitch contour and 
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...for automatically generating marked-up text corresponding to spoken utterance using prosodic parameters 

such as pitch contour and duration contour of utterance ...Inventor: BAKIS R EIDE E M ...NOVELTY - The 

prosodic parameters such as pitch contour, duration contour, energy contour information of spoken utterance are 
determined. A marked-up text corresponding to the... Original Publication Data by Authority... Inventor name & 
address:Baki$, Raimo Eide, Ellen M 
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Speech synthesizing method for speech-to-speech translation, involves increasing energy in low frequency 
components of pitch contour generated for synthesized speech 
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...synthesizing method for speech-to-speech translation, involves increasing energy in low frequency components 
of pitch contour generated for synthesized speech Original TitlesrMethod and apparatus for producing natural 

sounding pitch contours in a speech synthesizer Inventor: BAKIS R EIDE E M Alerting Abstract ...NOVELTY 

- A pitch contour for synthesized speech is generated. The amount of energy in the low frequency components of the 
pitch contour, is increased. ...The naturalness of sounding speech is achieved, as the energy in low frequency 
components of pitch contour is increased... Original Publication Data by Authoritylnventor name & address:Eide, 

Ellen Marie Bakis, Raimo Original Abstracts: A speech synthesis system is disclosed that utilizes a pitch 

contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for 
synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete... 
... values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, 
such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency 
values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by 
filtering the pitch values with an impulse response filter having a pole at the desired present invention serves to add 
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vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform. 
...Claimsrspeech, comprising: generating a pitch contour for said synthesized speech; and increasing an amount of 
energy in low frequency components of said pitch contour. 
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Pitch contours generation in text-to-speech system, involves determining stored stress level closer to stress 
levels... Original Titles: Methods for generating pitch and duration contours in a text to speech system, ...Inventor: 
EIDE E M Alerting Abstract ...associated with closest stress levels of stress and pitch level pairs are copied to 
generate pitch contours of input text. ...stress level corresponding to secondary stress, and second stress level 
corresponding to primary stress. A pitch contour model is trained based on training text read by at least one speaker to 

generate training sentences. The sequences of stress and pitch level pairs correspond to training sentences. The 

pitch contour of training sentences is calculated from laryngograph data as a function of time by noting USE - For 

generating pitch contours in text-to-speech (TtS) system quality of synthesized speech improves with the number 

of training utterances available for selecting the pitch contour to be synthesized, the use of only lexical stress contours 
as features for selecting the pitch contour enables a relatively small, efficiently searched database of pitch contours to 
suffice for very good quality prosody in synthesis. A smaller number of sentences are... Original Publication Data by 
Authoritylnventor name & address:Eide, Ellen M... Original Abstracts:A method for automatically generating pitch 

contours in a text to speech (TtS) system, the system converting input text into an output the closest stored stress 

levels of the stress and pitch level pairs to generate the pitch contours of the input text. Features illustrative of various 
modes of the invention include stress and... Claims:A method for generating pitch contours in a text to speech (TtS) 
system, the system converting input text into an output acoustic signal 
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Specification: ...features such as LPC (linear predictive coding) parameters of signal or features of the smoothed pitch 
contour and its derivatives. 

For this invention, the following strategy may be adopted. First, take into a linear regression for voiced part of 

speech, i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the proportion of 
voiced energy to... in effect correspond to resonances of the vocal tract, the human voice also contains a pitch, 

modulated by the speaker, which corresponds to the frequency at which the larynx modulates the air speech to text 

systems. 

The first reflection coefficient kl)) is approximately related to the high/low frequency energy ratio and a signal. See 

R. J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979--28, Lincoln Labs, June 1 1, 1979. For kl)) 

close to -1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for kl)) close 

to 1 and their goodness values C(k), dynamic programming is now used to obtain an optimum pitch contour which 

includes an optimum voicing decision for each frame. The dynamic programming requires several frames... 
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Specification: ...features such as LPC (linear predictive coding) parameters of signal or features of the smoothed pitch 
contour and its derivatives. 

The following strategy may be adopted. First, take into account fundamental frequency a linear regression for 

voiced part of speech, i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the 
proportion of voiced energy to.. .in effect correspond to resonances of the vocal tract, the human voice also contains a 

pitch, modulated by the speaker, which corresponds to the frequency at which the larynx modulates the air speech 

to text systems. 

The first reflection coefficient kl)) is approximately related to the high/low frequency energy ratio and a signal. See 
R. J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979, which is hereby incorporated by reference. 
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For kl)) close to -1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for 
kl)) close to l.„and their goodness values C(k), dynamic programming is now used to obtain an optimum pitch 
contour which includes an optimum voicing decision for each frame. The dynamic programming requires several 
frames... 
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Specification: ...coding, waveform matching in the high frequency region proves more difficult than matching in the 
low frequency region. Thus, the energy of the high frequency region of a synthesized signal drops more than in the 
low.. .the PP mode, the input original signal has been pitch-preprocessed to match the interpolated pitch contour, so no 
closed-loop search is needed. The LTP excitation vector is computed using the interpolated pitch contour and the past 
synthesized excitation. 

Fourth, the encoder processing circuitry generates a new target signal 6.65 kbps, the decision algorithm is as 

follows. First, at the block 241, a prediction of the pitch lag pit for the current frame is determined as follows: 
if([LTP(underscore)MODE(underscore I m - 1.0 . The obtained index Im)) will be sent to the decoder. 

The pitch lag contour, tau c)) (n), is defined using both the current lag Pm)) and the previous lag the past modified 

weighted speech buffer, s(circumflex) w))(mO+n), n<0, with the pitch lag contour, tauc)) (n+m.Ls)) ), m=0.1.2. where 

Tc))(n) and TIC))(n) are search, and LTP excitation is directly computed according to past synthesized excitation 

because the interpolated pitch contour is set for each frame. When the AMR coder operates with LTP-mode, the 

pitch n<L(underscore)SF), is calculated by interpolating the past excitation (adaptive codebook) with the pitch lag 

contour , t r)) (n + m.L(underscore)SF), m = 0,1,2,3. The interpolation is... the innovation codebook gain) are coded 
for every subframe. The LSF vector is coded using predictive vector quantization. The pitch lag has an integer part 

and a fractional part constituting the pitch period. The quantized result, for example, a codec might produce a 

synthesized residual that has greater high frequency energy and lesser low frequency energy than would otherwise be 
desired. In other words, the resultant synthesized residual would exhibit an... 
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...Parameters Measured from a 

Speech Signal: PITCH-RELATED PROSODIC CUES 

Pitch or fundamental frequency FO, Pitch contour (possibly smoothened), Mean FO, Median FO, Maximum FO, 

Minimum FO, FO range, about 95 th to verbal prosody, that is, the set of suprasegmental features of speech, such as 

stress, pitch, contour, juncture, intonation (melody), rhythm, tempo, loudness, voice quality (smooth, coarse, shaky, 

creaky phonation, grumbly, etc to separate speech 

waveforms associated with various speakers (and optionally from ambient 
sounds) using a low-frequency energy-based scheme (T. Choudhury and A. 

Pentland, "Modeling Face-to-Face Communication Using the Sociometer... 
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Detailed Description: 

...200, the spectral tilt is 

estimated in the frequency domain as a ratio between the energy 
concentrated in low frequencies and the energy concentrated in high 

frequencies. However, it can be also estimated in different ways such as and j, is the index of the first bin in the ith 

critical band. 

The energy in low^ frequencies is computed as the average of 

the energies, in the first 1 0 critical bands bands have 

been excluded from the computation to improve the discrimination 
between frames with high-energy concentration in low frequencies 
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(generally voiced) and with high-energy concentration in high frequencies (generally unvoiced). In between, the ... 
...content is not characteristic for any of the classes and increases the decision confusion. 

The energy in low frequencies is computed differently for 

long pitch periods and short pitch periods. For voiced female speech to the nearest harmonics are taken into 

account. Hence, if the structure is harmonic in low frequencies, only high-energy terms will 1 5 be included in the 
sum. On the other hand, if the will be random and the sum will be smaller. 

Thus even unvoiced sounds with high energy content in low frequencies 

can be detected. This processing cannot be done for longer pitch periods, as the frequency sufficient. For pitch 

values larger than .20 128 or for a priori unvoiced sounds the low frequency energy is computed 

per critical band as 

9 

El EcB(k) 
10k=O 

A priori... Interpolation gives a delay value for every time instant of the frame. 

After the delay contour is available, the pitch in the subframe to be coded currently is adjusted to follow this artificial 

contour by other signal-coding parameters. In signal modification, the signal is forced to follow a certain pitch 

contour that can be transmitted 

with 9 bits per frame. Good performance of long-term prediction... 
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Detailed Description: 



...200, the spectral tilt is 

estimated in the frequency domain as a ratio between the energy 
concentrated in low frequencies and the energy concentrated in high 

frequencies. However, it can be also estimated in different ways such as... ...and j, is the index of the first bin in the ith 

critical band. 

The energy in low frequencies is computed as the average of 

the energies in the first 10 critical bands. The been excluded from the computation to improve the discrimination 

I 0 between frames with high-energy concentration in low frequencies 

(generally voiced) and with high-energy concentration in high frequencies (generally unvoiced). In between, the ... 
...characteristic for any of the classes and increases the decision confusion. 

1 5 i The energy in low frequencies is computed differently for 

long pitch periods and short pitch.periods. For voiced female speech to the nearest harmonics are taken into 

account. Hence, if the structure is harmonic in low frequencies, only high-energy terms will be included in the sum. 
On the other hand, if the structure is will be random and the sum will be smaller. 

Thus even unvoiced sounds with high energy content in low frequencies 

can be detected. This processing cannot be done for longer pitch periods, I 0 as not sufficient. For pitch values 

larger than 

128 or for a priori unvoiced sounds the low frequency energy is computed 

per critical band as 

1 

El =-YEcB(k) 
j 

10k=O 

1 ...Interpolation gives a delay value for every time instant of the frame. 

After the delay contour is available, the pitch in the subframe to be coded currently is adjusted to follow this artificial 

contour by other signal-coding parameters. In signal modification, the signal is forced to follow a certain pitch 

contour that can be transmitted 

with 9 bits per frame. Good performance of long-term prediction... 



Claims: 

...recited in claim 7, wherein said spectral tilt 

is proportionate to a ratio between the energy concentrated in low frequencies and the energy concentrated in high 

frequencies of said signal frame. 

9 A method as recited in claim 8, wherein said energy 

concentrated in low frequencies and said energy concentrated in highfrequencies are computed following the 
perceptual critical bands. 1 0. A method recited in claim 24, wherein said spectral tiltis proportionate to a ratio 
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between the energy concentrated in low frequencies and the energy concentrated in high frequencies of said signal 
frame.26.A method as recited in claim 25, wherein said energyconcentrated in low frequencies and said energy 
concentrated in high frequencies are computed following the perceptual critical bands. 
27 A method as... 
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Detailed Description: 

...features such as LPC (linear predictive coding) parameters of signal or features of the smoothed pitch contour and 
its derivatives. 

For this invention, the following strategy may be adopted. First, take into a linear regression for voiced part of 

speech, i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the proportion of 
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voiced energy to.. .in effect correspond to resonances of the vocal tract, the human voice also contains a pitch, 
modulated by the speaker, which corresponds to the frequency at which the larynx modulates the air.. .speech to text 
systems. 

The first reflection coefficient ki is approximately related to the high/low frequency energy ratio and a signal. See R. 

J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979, which is hereby incorporated by reference. For 

ki close to -1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for k, close 

to 1... their goodness values C(k), dynamic programming is now 

57 

used to obtain an optimum pitch contour which includes an optimum voicing decision for each flame. The dynamic 
programming requires several frames... 



1 1/3K/8 (Item 5 from file: 349) Links 
PCT FULLTEXT 

(c) 2007 WIPO/Thomson. All rights reserved. 
00784164 

A SYSTEM, METHOD, AND ARTICLE OF MANUFACTURE FOR DETECTING EMOTION IN VOICE 
SIGNALS THROUGH ANALYSIS OF A PLURALITY OF VOICE SIGNAL PARAMETERS 

SYSTEME, PROCEDE ET ARTICLE MANUFACTURE DE DETECTION DES EMOTIONS DANS LES 
SIGNAUX VOCAUX PAR ANALYSE D'UNE PLURALITE DE PARAMETRES DE SIGNAUX VOCAUX 



Patent Applicant/Patent Assignee: 

• ANDERSEN CONSULTING LLP; 1661 Page Mill Road, Palo Alto, CA 94304 
US; US(Residence); US(Nationality) 



Legal Representative: 



• HICKMAN Paul L(agent) 

Hickman, Coleman & Hughes, LLP, P.O. Box 52037, Palo Alto, CA 94303-0746; US; 





Country 


Number 


Kind 


Date 


Patent 


WO 


200116938 


Al 


20010308 


Application 


WO 


2000US23884 




20000831 


Priorities 


US 


99388027 




19990831 



Designated States: (All protection types applied unless otherwise stated - for applications 2004+) 



|EP| AT; BE; CH; CY; DE; DK; ES; FI; FR; GB; 
OR; IE; IT; LU; MC; NL; PT; SE; 

lOAl BF; BJ; CF; CO; CI: CM; OA; GN; GW; ML; 
MR; NE; SN; TD; TG; 

|AP| GH; GM; KE; LS; MW; MZ; SD; SL; SZ; TZ; 
UG; ZW; 

|EA| AM; AZ; BY; KG; KZ; MD; RU; TJ; TM; 



Main International Patent Classes (Version 7): 



IPC 


Level 


GlOL-017/00 


Main 



Publication Language: English 
Filing Language: English 
Fulltext word count: 39489 



29 



Detailed Description: 

...features such as LPC (linear predictive coding) parameters. of signal or features of the smoothed pitch contour and 
its derivatives. 

For this invention, the following strategy may be adopted. First, take into a linear regression for voiced part of 

speech, i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the proportion of 
voiced energy to.. .in effect correspond to resonances of the vocal tract, the human voice also contains a pitch, 
modulated by the speaker, which corresponds to the frequency at which the larynx modulates the air.. .speech to text 
systems. 

The first reflection coefficient ki is approximately related to the high/low frequency energy ratio and a signal. See R. 

J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979, which is hereby incorporated by reference. For 

k, close to - 1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for k, close 

to 1... their goodness values C(k), dynamic programming is now 

57 

used to obtain an optimum pitch contour which includes an optimum voicing decision for each frame. The dynamic 
programming requires several frames... 
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...as LPC (linear predictive coding) parameters of signal or features of the 1 5 smoothed pitch contour and its 
derivatives. 



For this invention, the following strategy may be adopted. First, take into a linear regression for voiced part of 

speech, i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the proportion of 
voiced energy to.. .in effect correspond to resonances of the vocal tract, the human voice also contains a pitch, 
modulated by the speaker, which corresponds to the frequency at which the larynx modulates the air., .speech to text 
systems. 

The first reflection coefficient k, is approximately related to the high/low frequency energy ratio and a signal. See R. 

J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979, which is hereby incorporated by reference. For 

k, close to -1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for k, close 

to 1 ...their goodness values C(k), dynamic programming is. now 

58 

used to obtain an optimum pitch contour which includes an optimum voicing decision for each frame. The dynamic 
programming requires several frames... 
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Detailed Description: 

...features such as LPC (linear predictive coding) parameters of signal or features of the smoothed pitch contour and 
its derivatives. 

For this invention, the following strategy may be adopted. First, take into.. .a linear regression for voiced part of speech, 
i.e. the line that fits the pitch contour. The relative voiced energy can also be calculated as the proportion of voiced 
energy to... in effect correspond to resonances of the vocal tract, the human voice also contains a pitch, modulated by 
the speaker, which corresponds to the frequency at which the larynx modulates the air... speech to text systems. 

The first reflection coefficient ki is approximately related to the high/low frequency energy ratio and a signal. See R. 

J. McAulay, "Design of a Robust Maximum Likelihood Pitch 1979, which is hereby incorporated by reference. For 

k, close to -1, there is more low frequency energy in the signal than high-frequency energy, and vice versa for k, close 

to 1... their goodness values C(k), dynamic programming is now 

59 

used to obtain an optimum pitch contour which includes an optimum voicing decision for each frame. The dynamic 
programming requires several frames... 
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...coding, waveform matching in the high frequency region proves more difficult than matching in the low frequency 
region. Thus, the energy of the high frequency region of a synthesized speech signal drops more than in the.. .the PP 
mode, the input original signal has been pitch-preprocessed to match the interpolated pitch contour, so no closed-loop 
search is needed. The UP excitation vector is computed using the interpolated pitch contour and the past synthesized 
excitation. 

Fourth, the encoder processing circuitry generates a new target signal.. .65 kbps, the decision algorithm is as follows. 
First, at the block 24 1, a prediction of the pitch lag pit for the current frame is determined as follows. 

if (UP-MODE-M = l...max {I,„ -1,0). 

The obtained index !„ will be sent to the decoder. 

The pitch lag contour, rc(n), is defined using both the current lag P" and the previous lag Pn by warping the past 

modified weighted speech buffer, 9„(mO + n), n < 0, with the pitch lag contour, -r, (n + m - Ls), m = 0,1,2, 

S„ (mO + n) S„, (mO + n.,. search, and UP excitation is directly computed according to past synthesized excitation 
because the interpolated pitch contour is set for each frame. When the AMR coder operates with LTP-mode, the 
pitch 0<=n<L, 

SFI, is calculated by interpolating the past excitation (adaptive 
codebook) with the pitch lag contour, r. (n + m-L 
SF), m = 0.1,2,3. The interpolation is 

performed using.. .the innovation codebook gain) are coded for every subframe. The LSF vector is coded using 
predictive vector quantization. The pitch lag has an integer part and a fractional part constituting the pitch period. The 
quantized.. .result, for example, a codec might produce a synthesized residual that has greater high frequency energy 
and lesser low frequency energy than would otherwise be desired. In other words, the resultant synthesized residual 
would exhibit an... 
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Abstract ...comprises synthesizing a wideband signal using the wideband LPCs and a wideband excitation signal, 
highpass filtering the synthesized wideband signal to produce a highband signal, and combining the highband signal 
with... 
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Specification: ...and determining its amplitude based on the amplitudes of the surrounding narrowband samples via 

lowpass filtering. However, there is weakness in the interpolated speech in that it does not contain any by a factor 

of 2 by inserting a zero sample following each input sample, highpass filtering with additional spectral shaping 38, and 

gain adjustment 40. Since the spectral folding operation reflects lower band into the upper band, i.e., highband, the 

purpose of the spectral shaping filter is to attenuate these signals in the highband. To reduce the spectral-gap about 

4kHz in the art. See, e.g., H. Yasukawa, Quality Enhancement of Band Limited Speech by Filtering and Multirate 

Techniques, in Proc. Intl. Conf Spoken Language Processing, ICSLP '94, pp. 1607-1610 input signal. Preferably, 

fullwave rectification is used for this purpose. Again, highpass and spectral shaping filters 48 with a gain adjustment 

50 are applied to the rectified signal to generate the a wideband excitation signal, to be shaped by the generated 

wideband spectral envelope 58. Highpass filtering and gain 60 extract the highband signal for combining with the 

original narrowband signal to logarithmic, typically extracted from an LP model. Almost all parametric techniques 

use an LPC synthesis filter for wideband signal generation (typically an intermediate wideband signal which is further 
highpass filtered), by exciting it with an appropriate wideband excitation signal. 

Parametric methods can be further classified for synthesizing the highband signal. The synthesis is carried out by 

exciting the LPC synthesis filter by a wideband excitation signal. The excitation signal is obtained by inverse filtering 

the input narrowband signal and spectral folding the resulting residual signal. The main disadvantage of air 

turbulences at constrictions in the vocal tract provide the excitation for unvoiced sounds. By filtering the speech signal 
with an inverse filter, whose coefficients are determined form the LPC model, the effect of the formants is removed... 
...LPC coefficients are used for synthesizing a wideband signal. The synthesized wideband signal is highpass filtered 

and summed with the original narrowband signal to generate the output wideband signal. Any monotonic The 

narrowband module comprises a signal interpolation module producing an interpolated narrowband signal, an inverse 
filter that filters the interpolated narrowband signal and a nonlinear operation module that generates an excitation 
signal from the filtered interpolated narrowband signal. The system further comprises a module for producing 

wideband coefficients. The wideband the wideband coefficients and the wideband excitation signal to synthesize a 

wideband signal. A highpass filter and gain module filters the wideband signal and adjusts the gain of the resulting 

highband signal. A summer sums using the wideband LPCs, and a wideband excitation signal generated from the 

narrowband signal; highpass filter the synthesized wideband signal to generate the synthesized highband signal; and 

sum the synthesized highband LPCs, synthesizing a wideband signal using the wideband LPCs and a wideband 

residual signal, highpass filtering the synthesized wideband signal to generate a synthesized highband signal, and 
generating the wideband signal of the present invention; 
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Fig. 9 shows the frequency response of a low pass interpolation filter; 

Fig. 10 shows the frequency response of an Intermediate Reference System (IRS), an IRS compensation filter and the 
cascade of the two; 

Fig. 1 1 is a flowchart representing an exemplary method parameters. In the training phase, wideband speech signals 

and the corresponding narrowband signals, obtained bylowpass filtering, are available so that the relationship between 
the corresponding parameter sets could be determined. 

Some synthesizing a wideband signal using an LPC synthesis approach followed by highpass and spectral shaping 

filters. The method according to the present invention also belongs to this category of parametric with converted 

first to LP parameters. These LP parameters are then used to construct a synthesis filter, which needs to be excited by a 
suitable wideband excitation signal. 

Two alternative approaches, commonly and 5B. First, as shown in Fig. 5A, the narrowband input speech signal is 

inverse filtered 72 using previously extracted LP coefficients to obtain a narrowband residual signal. This is 
accomplished flattening can be done by applying an LPC analysis to this signal, followed by inverse filtering. 

A second and preferred alternative is shown in Fig. 5B. It is useful for reducing need to perform the necessary 

additional interpolation in the first scheme. To perform the inverse filtering 84, the option exists in this case for either 
using the wideband LP parameters obtained from the mapping stage to get the inverse filter coefficients, or inserting 

zeros, like in spectral folding, into the narrowband LP coefficient vector. The when a nonlinear operator is used, 

i.e., using the original LP coefficients for inverse filtering 72 the input narrowband signal followed by interpolation. 
The bandwidth of the resulting residual signal.. .637-655, 1971; and H. Wakita, Direct Estimation of the Vocal Tract 
Shape by Inverse Filtering of Acoustic Speech Waveform, IEEE Trans. Audio and Electroacoust., vol. AU-21, No. 5, 

PP sections of equal length, as schematically shown in Fig. 6. Moreover, an equivalence of the filtering process by 

the acoustic tube and by the LP all-pole filter model of the pre-emphasized speech has been shown to exist under the 

constraint: In to compensate for the glottal wave shape and lip radiation. Typically, a fixed pre-emphasis filter is 

used, usually of the form 1 - (mu)z-l), where (mu) is chosen to of the original wideband speech signal from which 

the narrowband signal was extracted by lowpass filtering. Using the approach according to the present invention, one 

can find a refinement as demonstrated byupsampling 1 12, for example, by inserting a zero sample following each 

input sample and lowpass filtering at 4 kHz, yielding the narrowband interpolated signal . The symbol "(equivalent to)" 
relates to narrowband interpolated signals. Because of the spectral folding caused by upsampling, high energy formants 
at low frequencies, typically present in voiced speech, are reflected to high frequencies and need to be strongly 
attenuated by the lowpass filter (not shown). Otherwise, relafively strong undesired signals may appear in the 
synthesized highband. 

Preferably, the lowpass filter is designed using the simple window method for FIR filter design, using a window 

function with sufficiently high sidelobes attenuation, like the Blackman window. See increases with frequency, as 

desired here. The frequency response of a 129 long FIR lowpass filter designed with a Blackman window and used in 
simulations is shown in Fig. 9. 

In frame update is used. The signal is first pre-emphasized using a first order FIR filter 1 - (mu)z-l), with (mu) = 

(rho)l)), where, as mentioned above, (rho)1)) is in this paragraph are all performed by the LPC analysis module 

1 14. The corresponding inverse filter transfer function is then given by Anb)) (z) : However, to generate the LPC 

residual signal higher sampling rate (fwbs = 16 kHz if fnbs = 8 kHz), the interpolated signal is inverse filtered by 

Anb)) (z2)), as shown by block 126. The filter coefficients, which are denoted by anb) (up arrow) 2, are simply 

obtained from anb) by i.e., inserting zeros - as done for spectral folding. Thus, the coefficients of the inverse filter 

Anb)) (z2)), operating at the high sampling frequency, including the unity leading term, are: The Mnb)), not to be 

confused with Anb)) (z) in equ. (3), which denotes the inverse-filter transfer function, are computed 1 16 from the 

partial correlation coefficients (parcors) of the narrowband signal coefficients represent a wideband spectral 

envelope. 

To synthesize the highband signal, the wideband LPC synthesis filter 122, which uses these coefficients, needs to be 

excited by a signal that has energy the whole upper band, is a desired feature and eliminates the need to apply any 

filtering in addition to highpass filtering 134. Fullwave recfification is preferred. A memoryless nonlinearity maintains 

signal periodicity, thus avoiding artifacts caused present invention also takes into account that the highband signal 

of natural wideband speech has pitch dependent time-envelope modulation, which is preserved by the nonlinearity. 

The inventor's preference of fullwave rectification over the of spectral tilt is desired, then either the wideband 

excitation can be flattened via inverse filtering, as discussed above, or infinite clipping can be used having the 

characteristics shown in Fig is not identical to the original input narrowband signal, the synthesized signal is 

preferably highpass filtered 134 and the resuUing highband signal, Shb)), is gain adjusted 134 and added 136 to ... 
...put signal S(sup AND)wb)). Note that like the gain factor, also the highpass filter can be applied either before or 
after the wideband LPC synthesis block. 
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While Fig. 8 done in Fig. 8 (the HPF needs then to be replaced by a proper shaping filter to attenuate high 

frequencies, as discussed earlier). The use of spectral folding is, of course Fig. 8 on the above residual signal (i.e., 

obtained by using awb)), but highpass 

filter its output, and combine it (after proper gain adjustment) with the interpolated narrowband residual signal the 

wideband excitation signal rwb)). This signal is fed then into the wideband LPC synthesis filter. Here again the 
resulting signal, ywb)), can serve as the desired output signal. 

Various components narrowband module from Fig. 8 may comprise the 1 :2 interpolation block 1 12, the inverse 

filter 126 and the elements 128, 130 and 132 to generate an excitation signal from the signal. 

Another way to generate a highband signal is to excite the wideband LPC synthesis filter (constructed from the 
wideband LPC coefficients) by white noise and apply highpass filtering to the synthesized signal. While this is a well- 
known simple technique, it suffers from Fig. 9 illustrates a graph 138 includes the frequency response of a low pass 

interpolation filter used for 2:1 signal interpolation. Preferably, the filter is a half-band linear-phase FIR filter, 
designed by the window method using a Blackman window. 

When the narrowband speech is obtained sector of the International Telecommunication Union), for analog 

telephone channels. The frequency response of a filter that simulates the IRS characteristics is shown in Fig. 10 as a 

dashed line 146 mitigate them. Also shown in Fig. 10 are the frequency response associated with a compensation 

filter 142 and the response associated with the cascade of the two (compensated response). 

One aspect to the above, and independently of it, it is useful to use an extended highpass filter, having a cutoff 

frequency Fc)) matched to the upper edge of the signal band (3 addition to an IRS channel response 1 46, Fig. 1 0 

shows the response of a compensating filter 142 and the resulting compensated response 144, which is flat in the 
nominal range. The compensation filter designed here is an FIR filter of length 129. This number could be lowered 
even to 65, with only little effect. The compensated signal becomes then the input to the bandwidth extension system. 

This filtering of the output signal from a telephone channel would then be added as a block art, the lowerband 

signal may be generated by just applying a narrow (300 Hz) iowpass filter to the synthesized wideband signal in 

parallel to the highpass filter 134 in Fig. 8. Other known work in the art addresses this issue more carefully H. 

Yasukawa, Restoration of Wide Band Signal from Telephone Speech using Linear Prediction Residual Error Filtering, 
in Proc. IEEE Digital Signal Processing Workshop, pp. 176-178, 1996. This approach includes adding to the proposed 
system a 300 Hz LPF in parallel to the existing highpass filter. However, because the nonlinear operator injects also 
undesired components into the lowband (as excitation), audible... input signal, Snb)), by a factor, such as a factor of 2 
(upsampling and Iowpass filtering). This step results in a narrowband interpolated signal . The signal is inverse filtered 
( 1 66) using, for example, a transfer function of Anb)) (z2)) having the coefficients shown in rate. 

Next, a non-linear operation is applied to the signal output from the inverse filter. The operation comprises fullwave 

rectification (absolute value) of residual signal (168). Other nonlinear operators discussed to signal rectification (as 

discussed below) via LPC analysis of the rectified signal and inverse filtering. The preferred setting here is no spectral 
tilt compensation. 

Next, the highband signal must be added (174) to the original narrowband signal. This step comprises exciting a 

wideband LPC synthesis filter (170) (with coefficients awb)) by the generated wideband excitation signal rwb)), 

resulting in a wideband may undergo further processing. If further processing is desired, the wideband signal ywb)) 

is highpass filtered (1 72) using a HPF having its cutoff frequency at Fc)) to generate a highband signal original 

highband signal, which is maintained also in the generated highband signal. 

Applying a dispersion filter such as an allpass nonlinear-phase filter, as in the 2400 bps DoD standard MELP coder, 

for example, can mitigate the spiky wideband LPCs awbi and the wideband excitation signal, generating a highband 

signal Shb)) by highpass filtering ywb)), adjusting the gain and generating the wideband signal by summing the 

synthesized highband signal a signal obtained bypassing a white Gaussian signal, v(n), through a half-band Iowpass 

filter are discussed followed by some specific nonlinear memory less operators, namely-generalized rectification, 
defined below, and Hill, New York, 1965 ("Papoulis"). 

Referring to Fig. 18, the signal v(n) is Iowpass filtered 320 to produce x(n) and then passed through a nonlinear 
operator 322 to produce a signal z(n). The Iowpass filtered signal x(n) has, ideally, a flat spectral magnitude for -(pi) / 

2 ≤ (theta) &le v(n) has zero mean and variance (sigma)2v, and that the half-band Iowpass filter is ideal, the 

autocorrelation functions of v(n) and x(n) are: where (delta)(m in mitigating the 'spectral gap' near 4 kHz. It also 

helps when a narrow Iowpass filter is used to extract from the synthesized wideband signal a synthetic lowband (0 - 

300 Hz to be useful. It can be added to the bandwidth extension system as a preprocessing filter at its input, as 

demonstrated herein. 

It should be noted that when the input signal... 
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Specification: ...and determining its amplitude based on the amplitudes of the surrounding narrowband samples via 

lowpass filtering. However, there is weakness in the interpolated speech in that it does not contain any by a factor 

of 2 by inserting a zero sample following each input sample, highpass filtering with additional spectral shaping 38, and 

gain adjustment 40. Since the spectral folding operation reflects lower band into the upper band, i.e., highband, the 

purpose of the spectral shaping filter is to attenuate these signals in the highband. To reduce the spectral-gap about 

4kHz in the art. See, e.g., H. Yasukawa, Quality Enhancement of Band Limited Speech by Filtering and Multirate 

Techniques, in Proc. Intl. Conf Spoken Language Processing, ICSLP *94, pp. 1607-1610 input signal. Preferably, 

fullwave rectification is used for this purpose. Again, highpass and spectral shaping filters 48 with a gain adjustment 

50 are applied to the rectified signal to generate the a wideband excitation signal, to be shaped by the generated 

wideband spectral envelope 58. Highpass filtering and gain 60 extract the highband signal for combining with the 

original narrowband signal to logarithmic, typically extracted from an LP model. Almost all parametric techniques 

use an LPC synthesis filter for wideband signal generation (typically an intermediate wideband signal which is further 
highpass filtered), by exciting it with an appropriate wideband excitation signal. 

Parametric methods can be further classified for synthesizing the highband signal. The synthesis is carried out by 

exciting the LPC synthesis filter by a wideband excitation signal. The excitation signal is obtained by inverse filtering 

the input narrowband signal and spectral folding the resulting residual signal. The main disadvantage of air 

turbulences at constrictions in the vocal tract provide the excitation for unvoiced sounds. By filtering the speech signal 
with an inverse filter, whose coefficients are determined from the LPC model, the effect of the formants is 
removed.. .LPC coefficients are used for synthesizing a wideband signal. The synthesized wideband signal is highpass 
filtered and summed with the original narrowband signal to generate the output wideband signal. Any monotonic ... 
...of the present invention; 

Fig. 9 shows the frequency response of a low pass interpolation filter; 

Fig. 10 shows the frequency response of an Intermediate Reference System (IRS), an IRS compensation filter and the 
cascade of the two; 

Fig. II is a flowchart representing an exemplary method In the training phase, wideband speech signals and the 

corresponding narrowband signals, obtained by lowpass filtering, are available so that the relationship between the 
corresponding parameter sets could be determined. 

Some synthesizing a wideband signal using an LPC synthesis approach followed by highpass and spectral shaping 

filters. The method according to the present invention also belongs to this category of parametric with converted 

first to LP parameters. These LP parameters are then used to construct a synthesis filter, which needs to be excited by a 
suitable wideband excitation signal. 

Two alternative approaches, commonly and 5B. First, as shown in Fig. 5A, the narrowband input speech signal is 

inverse filtered 72 using previously extracted LP coefficients to obtain a narrowband residual signal. This is 
accomplished flattening can be done by applying an LPC analysis to this signal, followed by inverse filtering. 

A second and preferred alternative is shown in Fig. 5B. It is useful for reducing need to perform the necessary 

additional interpolation in the first scheme. To perform the inverse filtering 84, the option exists in this case for either 
using the wideband LP parameters obtained from the mapping stage to get the inverse filter coefficients, or inserting 

zeros, like in spectral folding, into the narrowband LP coefficient vector. The when a nonlinear operator is used, 

i.e., using the original LP coefficients for inverse filtering 72 the input narrowband signal followed by interpolation. 

The bandwidth of the resulting residual signal 637-655, 1971; and H. Wakita, Direct Estimation of the Vocal Tract 

Shape by Inverse Filtering of Acoustic Speech Waveform, IEEE Trans. Audio and Electroacoust., vol. AU-21, No. 5, 
pp sections of equal length, as schematically shown in Fig. 6. Moreover, an equivalence of the 

filtering process by the acoustic'tube and by the LP all-pole filter model of the pre-emphasized speech has been shown 

to exist under the constraint: M to compensate for the glottal wave shape and lip radiation. Typically, a fixed pre- 

emphasis filter is used, usually of the form I - .(micro).z -1)), where .(micro), is chosen to of the original wideband 

speech signal from which the narrowband signal was extracted by lowpass filtering. Using the approach according to 

the present invention, one can find a refinement as demonstrated byupsampling 1 12, for example, by inserting a 

zero sample following each input sample and lowpass filtering at 4 kHz, yielding the narrowband interpolated signal 
S(tilde) nb )) . The symbol "(tilde)" relates to narrowband interpolated signals. Because of the spectral folding caused 
by upsampling, high energy formants at low frequencies, typically present in voiced speech, are reflected to high 
frequencies and need to be strongly attenuated by the lowpass filter (not shown). Otherwise, relatively strong undesired 
signals may appear in the synthesized highband. 

Preferably, the lowpass filter is designed using the simple window method for FIR filter design, using a window 
function with sufficiently high sidelobes attenuation, like the Blackman window. See increases with frequency, as 
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desired here. The frequency response of a 129 long FIR lowpass filter designed with a Blackman window and used in 
simulations is shown in Fig. 9. 

In frame update is used. The signal is first pre-emphasized using a first order FIR filter 1- .(micro).z -1)), with 

.(micro). ^ rhol )), where, as mentioned above, rhol )) is the correlation in this paragraph are all performed by the 

LPC analysis module 1 14. The corresponding inverse filter transfer function is then given by A nb )) (z) : A nb z = 1 + 

Sigma i kHz if f s nb = 8 kHz), the interpolated signal S(tilde) nb )) is inverse filtered by A nb )) (z 2))), as shown 

by block 126. The filter coefficients, which are denoted by a nb )) (down/uparrow)2, are simply obtained from a... 
...i.e., inserting zeros - as done for spectral folding. Thus, the coefficients of the inverse filter A nb )) (z 2))), operating 

at the high sampling frequency, including the unity leading term not to be confused with A nb )) (z) in equ. (3), 

which denotes the inverse-filter transfer function, are computed 1 16 from the partial correlation coefficients (parcors) 
of the narrowband signal coefficients represent a wideband spectral envelope. 

To synthesize the highband signal, the wideband LPC synthesis filter 122, which uses these coefficients, needs to be 

excited by a signal that has energy the whole upper band, is a desired feature and eliminates the need to apply any 

filtering in addition to highpass filtering 134. Fullwave rectification is preferred. A memoryless nonlinearity maintains 

signal periodicity, thus avoiding artifacts caused present invention also takes into account that the highband signal 

of natural wideband speech has pitch dependent time-envelope modulation, which is preserved by the nonlinearity. 

The inventor's preference of fullwave rectification over the of spectral tilt is desired, then either the wideband 

excitation can be flattened via inverse filtering, as discussed above, or infinite clipping can be used having the 

characteristics shown in Fig is not identical to the original input narrowband signal, the synthesized signal is 

preferably highpass filtered 134 and the resulting highband signal, S hb )), is gain adjusted 134 and added 136 out 

put signal S(circumflex) wb )). Note that like the gain factor, also the highpass filter can be applied either before or 
after the wideband LPC synthesis block. 

While Fig. 8.. .done in Fig, 8 (the HPF needs then to be replaced by a proper shaping filter to attenuate high 

frequencies, as discussed earlier). The use of spectral folding is, of course above residual signal r(tilde) nb )) (i.e., 

obtained by using a wb ))), but highpass filter its output, and combine it (after proper gain adjustment) with the 

interpolated narrowband residual signal wideband excitation signal r wb )). This signal is fed then into the 

wideband LPC synthesis filter. Here again the resulting signal, y wb )), can serve as the desired output signal. 

Various narrowband module from Fig. 8 may comprise the 1:2 interpolation block 1 12, the inverse filter 126 and 

the elements 128, 130 and 132 to generate an excitation signal from the signal. 

Another way to generate a highband signal is to excite the wideband LPC synthesis filter (constructed from the 
wideband LPC coefficients) by white noise and apply highpass filtering to the synthesized signal. While this is a weli- 

known simple technique, it suffers from Fig, 9 illustrates a graph 138 includes the frequency response of a low pass 

interpolation filter used for 2:1 signal interpolation. Preferably, the filter is a half-band linear-phase FIR filter, 
designed by the window method using a Blackman window. 

When the narrowband speech is obtained sector of the International Telecommunication Union), for analog 

telephone channels. The frequency response of a filter that simulates the IRS characteristics is shown in Fig. 10 as a 

dashed line 146 mitigate them. Also shown in Fig. 10 are the frequency response associated with a compensation 

filter 142 and the response associated with the cascade of the two (compensated response). 

One aspect to the above, and independently of it, it is useful to use an extended highpass filter, having a cutoff 

frequency F c )) matched to the upper edge of the signal band addition to an IRS channel response 146, Fig. 10 

shows the response of a compensating filter 142 and the resulting compensated response 144, which is flat in the 
nominal range. The compensation filter designed here is an FIR filter of length 129. This number could be lowered 
even to 65, with only little effect. The compensated signal becomes then the input to the bandwidth extension system. 

This filtering of the output signal from a telephone channel would then be added as a block art, the lowerband 

signal may be generated by just applying a narrow (300 Hz) lowpass filter to the synthesized wideband signal in 

parallel to the highpass filter 134 in Fig. 8. Other known work in the art addresses this issue more carefully H. 

Yasukawa, Restoration of Wide Band Signal from Telephone Speech using Linear Prediction Residual Error Filtering, 
in Proc. IEEE Digital Signal Processing Workshop, pp. 176-178, 1996. This approach includes adding to the proposed 
system a 300 Hz LPF in parallel to the existing highpass filter. However, because the nonlinear operator injects also 

undesired components into the lowband (as excitation), audible signal, S nb )), by a factor, such as a factor of 2 

(upsampling and lowpass filtering). This step results in a narrowband interpolated signal S(tilde) nb )) , The signal 
S(tilde) nb )) is inverse filtered (166) using, for example, a transfer function of A nb )) (z 2))) having the coefficients... 
...rate. 

Next, a non-linear operation is applied to the signal output from the inverse filter. The operation comprises fullwave 
rectification (absolute value) of residual signal S(tilde) nb )) (168). Other to signal rectification (as discussed 
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below) via LPC analysis of the rectified signal and inverse filtering. The preferred setting here is no spectral tilt 
compensation. 

Next, the highband signal must be added (1 74) to the original narrowband signal. This step comprises exciting a 

wideband LPC synthesis filter (170) (with coefficients a wb ))) by the generated wideband excitation signal r wb )), 

resulting in undergo further processing. If further processing is desired, the wideband signal y wb )) is highpass 

filtered (172) using a HPF having its cutoff frequency at F c )) to generate a highband original highband signal, 

which is maintained also in the generated highband signal. 

Applying a dispersion filter such as an allpass nonlinear-phase filter, as in the 2400 bps DoD standard MELP coder, 

for example, can mitigate the spiky i wb and the wideband excitation signal, generating a highband signal S hb )) by 

highpass filtering y wb )), adjusting the gain and generating the wideband signal by summing the synthesized 

highband a signal obtained bypassing a white Gaussian signal, v(n), through a half-band lowpass filter are 

discussed followed by some specific nonlinear memoryless operators, namely-generalized rectification, defined below, 
and McGraw-Hill, New York, 1965 ("Papoulis"). 

Referring to Fig. 18, the signalv(n) is lowpass filtered 320 to produce x(n) and then passed through a nonlinear 
operator 322 to produce a signal z(n). The lowpass filtered signal x(n) has, ideally, a flat spectral magnitude for -pi / 2 

<= theta <= pi / 2 n) has zero mean and variance sigma v 2 , and that the half-band lowpass filter is ideal, the 

autocorrelation functions of v(n) and x(n) are: R v m...in mitigating the 'spectral gap' near 4 kHz. It also helps when a 

narrow lowpass filter is used to extract from the synthesized wideband signal a synthetic lowband (0 - 300 Hz to be 

useful. It can be added to the bandwidth extension system as a preprocessing filter at its input, as demonstrated herein. 

It should be noted that when the input signal... 

Claims: ...wideband signal from a narrowband signal of claim 16, the method further comprising: 

(8) highpass filtering the wideband signal ywb)) to generate a highband signal; and 

(9) combining the highband signal the bandwidth of a narrowband signal of claim 21, the method further 

comprising: 

(7) highpass filtering the wideband signal ywb)) to produce a highband signal; and 

(8) combining the highband signal wideband LPCs; and 

(5) synthesizing a wideband signal ywb)) using the wideband LPCs and highpass 

filtered white noise in the higher band of an excitation signal and a linear prediction residual 24, wherein 

computing the excitation signal from a narrowband prediction residual signal further comprises inverse filtering the 
narrowband signal. 

26. A method of producing a wideband signal from a narrowband signal wideband signal ywb)) from the wideband 

LPCs awbi and the wideband excitation signal; 

(8) highpass filtering the wideband signal ywb)) to produce a highband signal; and 

(9) generating a wideband signal narrowband signal to produce an upsampled narrowband signal; 

producing a narrowband residual signal by inverse filtering the upsampled interpolated narrowband signal using a 

transfer function associated with the awbiLP coefficients; and wideband signal from a narrowband signal of claim 

28, the method further comprising: 

(10) highpass filtering the wideband signal ywb)) to generate a highband signal Shb)); and 

(11) generating a wideband signal to produce an upsampled interpolated narrowband signal; 

producing a narrowband residual signal by inverse filtering the upsampled interpolated narrowband signal using a 
transfer function associated with the awbi LP coefficients... 

Claims: ...excitation signal. 

14. A method as claimed in claim 13, the method further comprising:highpass filtering the wideband signal y wb )) to 
generate a highband signal; and 



combining the highband signal excitation signal. 
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19. A method as claimed in claim 18, the method further comprising:highpass filtering the wideband signal y wb )) to 
produce a highband signal; and 



combining the highband signal wideband LPCs; and 



synthesizing the wideband signal y wb )) using the wideband LPCs and highpass filtered white noise in the higher 

band of an excitation signal and a linear prediction residual 21, wherein computing the excitation signal from a 

narrowband prediction residual signal further comprises inverse filtering the narrowband signal. 

23. A method as claimed in claim 2, wherein the step of y wb )) from the wideband LPCs a i wb and 



the wideband excitation signal; 



highpass filtering the wideband signal y wb )) to produce a highband signal; and 



generating a wideband signal produce an upsampled narrowband signal; 



producing a narrowband residual signal r(tilde) nb )) by inverse filtering the upsampled interpolated narrowband signal 
using a transfer function associated with the a iwb LP excitation signal. 

26. A method as claimed in claim 25, the method further comprising:highpass filtering the wideband signal y wb )) to 
generate a highband signal S hb )); and 



generating a an upsampled interpolated narrowband signal; 



producing a narrowband residual signa r(tilde) nb )) by inverse filtering the upsampled interpolated narrowband signal 
using a transfer function associated with the a i wb... 

Claims: ...nach Anspruch 21, worin das Berechnen des Anregungssignals aus einem schmalbandigen Pradiktions- 
Restsignal ausserdem inverse Filterung des Schmalbandsignals umfasst. 

23. Verfahren nach Anspruch 2, worin der Schritt des Berechnens der M um ein aufwartsgetastetes 

Schmalbandsignal zu erzeugen; 



Erzeugen eines Schmalband-Restsignals r(tilde) nb )) durch inverse Filterung des aufwartsgetasteten interpolierten 

Schmalbandsignals unter Verwendung einer mit den a i wb linearen Pradiktions-Koeffizienten ein 

aufwartsgetastetes interpoliertes Schmalbandsignal zu erzeugen; 



Erzeugen eines Schmalband-Restsignals r(tilde) nb )) durch inverse Filterung des aufwartsgetasteten interpolierten 
Schmalbandsignals unter Verwendung einer mit den a i wb linearen Pradiktions-Koeffizienten... 
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Specification: ...selected on the basis of the weighting sum value of seven characteristic amounts including a low 
frequency speech energy, a zero-cross ratio, and the likes. 

Although the coding system select methods as described limited even if recovery processing of pitch information is 

performed with use of a post filter in the decoding side. 

Further, if coded data transferred with a ...In the present invention, a reference vector is extracted from an adaptive 
codebook and is filtered by the synthesize filter from which a synthesize signal is generated, and the similarity 

between the synthesize signal and easily attained even if the bit number assigned to a drive signal of the synthetic 

filter is reduced. In brief, the coding bit rate can be lowered. Inversely, when a target in case where the calculation 

amount for deciding a reference vector inputted into a synthesize filter is relatively large, and the calculation amount 
for selecting a coding scheme is remarkably small... an output terminal 13. The selection section 1 1 comprises an 
adaptive codebook 14, a synthesis filter 15, a similarity calculator 106, and a coding scheme determining circuit 17. 

In the next signal q(n) is generated from the vector p(n), by means of a synthesis filter 105. As a example, 

operation of the synthesis filter 105 can be expressed by the following equation (1) with respect to a z-conversion a 

step SI 1, and then, the vector p(n) is made pass through a synthesis filter 105, to prepare a synthesis vector q(n). Next, 
an optimum gain g to be., .and outputs T which minimizes the power E of a prediction error signal of the prediction, as 
a pitch period. Specifically, the prediction error signal power E is expressed as follows. Here, g denotes a pitch gain 

and n) in case where the target signal r(n) is weighted by a hearing weighting filter. In addition, since envelope 

information 0 of a speech signal can be removed with use v(n) obtained by making an input speech signal u(n) pass 

through an LPC prediction filter, much excellent pitch analysis can be achieved. Accordingly, in this embodiment, an 

input speech signal u(n) or Further, in this embodiment, although explanation has been made to a case where a 

primary pitch prediction filter is used in the pitch analyzer 22, a prediction filter of a higher order may be used. 

FIG. 5 is a block diagram showing the candidates, synthesis vectors are respectively obtained with respect to the 

reference vectors by the synthesis filter 15, and the synthesis vector most similar to the target vector r(n) is searched... 
an adaptive codebook 14, and coding scheme selection information 1 is outputted through a synthesis filter 15, a 
similarity calculator 16, and a coding scheme selection section 1 7 on the basis of the reference vector p(n). The 
processing performed by the synthesis filter 15, the similarity calculator 16, and the coding scheme selection section 17 

are respectively the signal may be of a signal which has been made pass through a hearing weighting filter and on 

which influences from a previous frame has been reduced, in several cases. Those.. .which will be referred to as an LPC 
coefficient, hereinafter) is obtained thereby. A synthesis filter 63 whose characteristic is defined by the LPC coefficient 

is inputted with an adaptive vector each other by an adder 73, thereby to generate a drive signal for a synthesis 

filter 75. 

Meanwhile, the characteristic of the synthesis filter 75 is defined on the basis of an LPC coefficient obtained by 

quantizing an LPC 74, and a drive signal outputted from an adder 73 is inputted into the synthesis filter 75, thereby 

generating a synthesis signal. With a signal from which influences of a previous frame, to obtain an error signal. 

The error signal is weighted by a hearing weighting filter 78, and thereafter, the electric power of the.signal is obtained 

by an error calculator 71 after multiplication by a pitch gain, thereby generating a drive signal for a synthesis filter 

83. In the next, an LPC coefficient is quantized by an LPC quantizer 82, and the characteristic of a synthesis filter 83 is 
defined on the basis of the LPC coefficient after the quantization. The synthesis filter 83 is inputted with a drive signal 
outputted from the multiplier 89, and a synthesis an error signal is thereby obtained. 

The error signal is weighted by a hearing weighting filter 85, and thereafter, the electric power is obtained by an error 

calculator 86. A gain codebook 67 as a component of an encoder of the CELP method and a synthesis filter 63 are 

used for selection of an encoder (or coding scheme), and therefore, it is Therefore, even if the number of bits 

assigned to a drive signal for the synthesis filter is reduced to be small, it is possible to easily attain target quality and 

to by making a reference vector obtained from the adaptive codebook 67 pass through the synthesis filter 73 and an 

input speech signal as a target signal is obtained by a similarity... method of obtaining a pitch period and a pitch gain 
with use of a primary pitch prediction filter, a higher order prediction filter may be used. In addition, another pitch 

analysis method, e.g., a zero-crossing method n). Here, explanation will be made to a case of using an all-pole pitch 

filter. The transmit function of a pole type pitch filter can be expressed as follows. Here, A(z) denotes a z- 

transformation value of an smaller than 1, and (epsilon) = 0.8 is recommended. To avoid making of an oscillation 

filter, it is necessary to monitor such that a product of g and (epsilon) is always The above explanation has been 

made to a case of using a primary pitch emphasis filter. However, the number of stages of the pitch emphasis filter 



42 



must not always be one stage, but the pitch emphasis filter may be stages equal in number to the number of analysis 

stages of the pitch although the above explanation has been made to a case where a pole type pitch filter is used, it 

is naturally possible to use, for example, an all-zero pitch filter, pole-zero pitch filter , etc. 

Although the characteristic is changed depending on the pitch gain g in the pitch... emphasis section 100 shown in FIG. 
39 has a structure obtained by adding a prediction filter 104 supplied with an input signal, a LPC analyzer 105 and a 

synthesis filter 106 to the emphasis section shown in FIG. 12. The contents of the processing will the like, and any 

of these methods can be used. In the next, a prediction filter is formed from an LPC coefficient, and an input signal is 
made pass through the prediction filter, thereby to generate a prediction remaining difference signal d(n) (in a step 
SI 102). The with a(n) of the equation (16) replaced with d(n). 

At last, a synthesis filter is formed from an LPC coefficient, and the pitch emphasis signal b(n) is made pass through 

the synthesis filter to generate a pitch-emphasized input signal e(n) (in a step SI 105). where n an index (number) 

are extracted. The LPC coefficient ai is supplied to an LPC synthesis filter 213. Note that Pisa prediction number of 
stages and P = 10 is generally used. A transmit function for an LPC synthesis filter 213 is supplied by the following 
equation (23). 

In the next, explanation will be made.. .At first, an influence onto a current frame from an internal state of the synthesis 
filter 213 in a previous frame is subtracted from one frame of speech signals inputted into the sub-frames. 

A drive signal vector as an input signal of an LPC synthesis filter 213 is obtained by adding a value, which is obtained 
by multiplying an adaptive vector a multiplier 210, by means of an adder 212. 

Here, the adaptive codebook 207 performs pitch prediction analysis described in the prior art reference 1, through 

closed loop operation or analysis by art reference 2). According to the reference 2, a drive signal for the LPC 

synthesis filter 213 is delayed by one sample by a delay circuit 21 1 for a pitch search one after another, and are 

respectively multiplied by predetermined gains obtained from the multiplier 209. Filter processing is performed by an 

LPC synthesis filter 2 1 3, and a synthesis signal vector is generated. The synthesis signal vector thus generated is a 

subtracter 203. An output of the subtracter 203 is inputted through a hearing weighting filter 204 to an error calculator 

205, and an average quadratic error is obtained. Information concerning steps is multiplied by a gain, and a 

synthesis speech signal vector is generated through filter calculation by the LPC synthesis filter 213. The vector thus 

generated is subtracted from a target vector, thereby resulting in a multiplication by a gain obtained from the gain 

codebook 218 by the multiplier 210, to filter calculation by the LPC synthesis filter 213. Thereafter, generation of a 

synthesis speech signal vector and calculation of an average square and 210 are each transmitted from an index 

selector 214. Note that the hearing weighting filter 204 is used to shape a spectrum of an error signal outputted from a 

subtracter adder 403 to generate a drive vector which is made pass through an LPC synthesis filter 404 whose 

setting is performed by an LPC coefficient transmitted from a coding section, thereby subjective quality of the 

synthesis signal, the synthesis signal is made pass through a post filter 405 to obtain a synthesis speech which is 
outputted through an output terminal 406. Finally... 

Specification: ...selected on the basis of the weighting sum value of seven characteristic amounts including a 
low frequency speech energy, a zero-cross ratio, and the likes. 

Although the coding system select methods as described.. .limited even if recovery processing of pitch information is 
performed with use of a post filter in the decoding side. 

Further, if coded data transferred with a transfer path code added In the present invention, a reference vector is 

extracted from an adaptive codebook and is filtered by the synthesize filter from which a synthesize signal is 

generated, and the similarity between the synthesize signal and easily attained even if the bit number assigned to a 

drive signal of the synthetic filter is reduced. In brief, the coding bit rate can be lowered. Inversely, when a target in 

case where the calculation amount for deciding a reference vector inputted into a synthesize filter is relatively large, 
and the calculation amount for selecting a coding scheme is remarkably small... an output terminal 13. The selection 
section 1 1 comprises an adaptive codebook 14, a synthesis filter 15, a similarity calculator 106, and a coding scheme 
determining circuit 1 7. 

In the next signal q(n) is generated from the vector p(n), by means of a synthesis filter 105. As a example, 

operation of the synthesis filter 105 can be expressed by the following equation (1) with respect to a z-conversion a 

step SI 1, and then, the vector p(n) is made pass through a synthesis filter 105, to prepare a synthesis vector q(n). Next, 
an optimum gain g to be., .and outputs T which minimizes the power E of a prediction error signal of the prediction, as 
a pitch period. Specifically, the prediction error signal power E is expressed as follows. Here, g denotes a pitch gain 

and n) in case where the target signal r(n) is weighted by a hearing weighting filter. In addition, since envelope 

information 0 of a speech signal can be removed with use v(n) obtained by making an input speech signal u(n) pass 

through an LPC prediction filter, much excellent pitch analysis can be achieved. Accordingly, in this embodiment, an 
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input speech signal u(n) or Further, in this embodiment, although explanation has been made to a case where a 

primary pitch prediction filter is used in the pitch analyzer 22, a prediction filter of a higher order may be used. 

FIG. 5 is a block diagram showing the candidates, synthesis vectors are respectively obtained with respect to the 

reference vectors by the synthesis filter 15, and the synthesis vector most similar to the target vector r(n) is searched... 
an adaptive codebook 14, and coding scheme selection information 1 is outputted through a synthesis filter 15, a 
similarity calculator 16, and a coding scheme selection section 17 on the basis of the reference vector p(n). The 
processing performed by the synthesis filter 15, the similarity calculator 16, and the coding scheme selection section 17 
are respectively the has been made pass through a hearing weighting filter and on which influences from a previous 

frame has been reduced, in several cases. Those which will be referred to as an LPC coefficient, hereinafter) is 

obtained thereby. A synthesis filter 63 whose characteristic is defined by the LPC coefficient is inputted with an 
adaptive vector each other by an adder 73, thereby to generate a drive signal for a synthesis filter 75. 

Meanwhile, the characteristic of the synthesis filter 75 is defined on the basis of an LPC coefficient obtained by 

quantizing an LPC 74, and a drive signal outputted from an adder 73 is inputted into the synthesis filter 75, thereby 

generating a synthesis signal. With a signal from which influences of a previous frame, to obtain an error signal. 

The error signal is weighted by a hearing weighting filter 78, and thereafter, the electric power of the signal is obtained 
by an error calculator.. .71 after multiplication by a pitch gain, thereby generating a drive signal for a synthesis filter 83. 
In the next, an LPC coefficient is quantized by an LPC quantizer 82, and the characteristic of a synthesis filter 83 is 
defined on the basis of the LPC coefficient after the quantization. The synthesis filter 83 is inputted with a drive signal 
outputted from the multiplier 89, and a synthesis an error signal is thereby obtained. 

The error signal is weighted by a hearing weighting filter 85, and thereafter, the electric power is obtained by an error 

calculator 86. A gain codebook 67 as a component of an encoder of the CELP method and a synthesis filter 63 are 

used for selection of an encoder (or coding scheme), and therefore, it is Therefore, even if the number of bits 

assigned to a drive signal for the synthesis filter is reduced to be small, it is possible to easily attain target quality and 

to by making a reference vector obtained from the adaptive codebook 67 pass through the synthesis filter 73 and an 

input speech signal as a target signal is obtained by a similarity.. .method of obtaining a pitch period and a pitch gain 
with use of a primary pitch prediction filter, a higher order prediction filter may be used. In addition, another pitch 

analysis method, e.g., a zero-crossing method n). Here, explanation will be made to a case of using an all-pole pitch 

filter. The transmit function of a pole type pitch filter can be expressed as follows. Here, A(z) denotes a z- 

transformation value of an smaller than 1, and (epsilon) = 0.8 is recommended. To avoid making of an oscillation 

filter, it is necessary to monitor such that a product of g and (epsilon) is always The above explanation has been 

made to a case of using a primary pitch emphasis filter. However, the number of stages of the pitch emphasis filter 
must not always be one stage, but the pitch emphasis filter may be stages equal in number to the number of analysis 

stages of the pitch although the above explanation has been made to a case where a pole type pitch filter is used, it 

is naturally possible to use, for example, an all-zero pitch filter, pole-zero pitch filter , etc. 

Although the characteristic is changed depending on the pitch gain g in the pitch... emphasis section 100 shown in FIG. 
39 has a structure obtained by adding a prediction filter 104 supplied with an input signal, a LPC analyzer 105 and a 

synthesis filter 106 to the emphasis section shown in FIG. 12. The contents of the processing will the like, and any 

of these methods can be used. In the next, a prediction filter is formed from an LPC coefficient, and an input signal is 
made pass through the prediction filter, thereby to generate a prediction remaining difference signal d(n) (in a step 
SI 102). The with a(n) of the equation (16) replaced with d(n). 

At last, a synthesis filter is formed from an LPC coefficient, and the pitch emphasis signal b(n) is made pass through 
the synthesis filter to generate a pitch-emphasized input signal e(n) (in a step SI 105). where n...an index (number) are 
extracted. The LPC coefficient ai is supplied to an LPC synthesis filter 213. Note that P is a prediction number of 
stages and P = 10 is generally used. A transmit function for an LPC synthesis filter 213 is supplied by the following 
equation (23). 

In the next, explanation will be made At first, an influence onto a current frame from an internal state of the 

synthesis filter 213 in a previous frame is subtracted from one frame of speech signals inputted into the sub-frames. 

A drive signal vector as an input signal of an LPC synthesis filter 213 is obtained by adding a value, which is obtained 
by multiplying an adaptive vector a multiplier 210, by means of an adder 212. 

Here, the adaptive codebook 207 performs pitch prediction analysis described in the prior art reference 1 , through 

closed loop operation or analysis by art reference 2). According to the reference 2, a drive signal for the LPC 

synthesis filter 213 is delayed by one sample by a delay circuit 21 1 for a pitch search one after another, and are 

respectively multiplied by predetermined gains obtained from the multiplier 209. Filter processing is performed by an 

LPC synthesis filter 213, and a synthesis signal vector is generated. The synthesis signal vector thus generated is a 

subtracter 203. An output of the subtracter 203 is inputted through a hearing weighting filter 204 to an error calculator 
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205, and an average quadratic error is obtained. Information concerning steps is multiplied by a gain, and a 

synthesis speech signal vector is generated through filter calculation by the LPC synthesis filter 213. The vector thus 

generated is subtracted from a target vector, thereby resulting in a multiplication by a gain obtained from the gain 

codebook 218 by the multiplier 210, to filter calculation by the LPC synthesis filter 213. Thereafter, generation of a 

synthesis speech signal vector and calculation of an average square and 210 are each transmitted from an index 

selector 214. Note that the hearing weighting filter 204 is used to shape a spectrum of an error signal outputted from a 
subtracter., .adder 403 to generate a drive vector which is made pass through an LPC synthesis filter 404 whose setting 

is performed by an LPC coefficient transmitted from a coding section, thereby subjective quality of the synthesis 

signal, the synthesis signal is made pass through a post filter 405 to obtain a synthesis speech which is outputted 
through an output terminal 406. Finally... 
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Specification: ...to a method and device for enhancing periodicity of the excitation of a signal synthesis filter in view 

of producing a synthesized wideband signal. 2. Brief description of the prior art and wireless applications, as well 

as Internet and packet network applications. Until recently, telephone bandwidths filtered in the range 200-3400 Hz 

were mainly used in speech coding applications. However, there number (corresponding to 10-30 ms of speech). In 

CELP, a linear prediction (LP) synthesis filter is computed and transmitted every frame. The L-sample frame is then 

divided into smaller signal is transmitted and used at the decoder as the input of the LP synthesis filter in order to 

obtain the synthesized speech. 

An innovative codebook in the CELP context, is synthesize speech according to the CELP technique, each block of 

N samples is synthesized by filtering an appropriate codevector from a codebook through time varying filters 

modeling the spectral characteristics of the speech signal. At the encoder end, the synthesis output perceptually 

weighted distortion measure. This perceptual weighting is performed using a so-called perceptual weighting filter, 
which is usually derived from the LP synthesis filter. 

A known CELP-based coder is described in the document EP-A-0788091. 

The CELP improves the quality in case of voiced segments. This was done in the past by filtering the innovative 

codevector from the fixed codebook through a filter having a transfer function of the form l/(l-(epsilon)bz-T)) where 

(epsilon) is invention is to propose a new alternative approach by which periodicity enhancement is achieved 

through filtering the innovative codevector by an innovation filter which reduces the low- frequency contents of the 

innovative codevector, whereby the innovative contribution is in relation to a pitch codevector and an innovative 

codevector for supplying a signal synthesis filter in view synthesizing a wideband signal. In this periodicity enhancing 
method, a periodicity factor related to the wideband signal is calculated. Then, the innovative codevector is filtered in 
relation to the periodicity factor to thereby reduce energy of a low frequency portion of the innovative codevector and 
enhance periodicity of a low frequency portion of the ...excitation signal produced in relation to adaptive and 
innovative codevectors for supplying a signal synthesis filter in view of synthesizing a wideband signal, comprises: 

a) a factor generator for calculating a periodicity factor related to said wideband signal; and 

b) an innovative filter for filtering the innovative codevector in relation to the periodicity factor to thereby reduce 
energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency portion of 
the excitation signal. 

According to a first preferred embodiment: 

- the innovative codevector is filtered with a transfer function of the form: where (alpha) is the periodicity factor 
derived from of the innovative codevector. 

According to a second preferred embodiment: 

- the the innovative codevector is filtered with a transfer function of the form: where (sigma) is a periodicity factor 

derived from from this encoded wideband signal at least pitch codebook parameters, innovative codebook 

parameters, and synthesis filter coefficients; 

b) an pitch codebook responsive to the pitch codebook parameters for producing a pitch factor generator for 

calculating a periodicity factor related to the wideband signal; and the innovation filter for filtering the innovative 
codevector in relation to the periodicity factor; 

e) a combiner circuit for combining the pitch codevector and the innovative codevector filtered by the innovation filter 
to thereby produce a periodicity-enhanced excitation signal; and 

f) a signal synthesis filter for filtering that periodicity-enhanced excitation signal in relation to the synthesis filter 
coefficients to thereby produce the synthesized wideband signal. 

According to the present invention, in a from this encoded wideband signal at least pitch codebook parameters, 

innovative codebook parameters, and synthesis filter coefficients; an pitch codebook responsive to the pitch codebook 
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parameters for producing a pitch codevector codevector and the innovative codevector to thereby produce an 

excitation signal; and a signal synthesis filter for filtering that excitation signal in relation to the synthesis filter 
coefficients to thereby produce the synthesized wideband signal; 

the improvement therein comprising a periodicity enhancing factor generator for calculating a periodicity factor 

related to the wideband signal; and the innovation filter for filtering the innovative codevector in relation to the 
periodicity factor before supplying this innovative codevector to.. .and below such as Code-Excited Linear Prediction 
(CELP) encoders typically use a LP synthesis filter to model the short-term spectral envelope of the voice signal. The 

LP information is signal in the frame are computed, encoded, and transmitted. LP parameters representing the LP 

synthesis filter are usually computed once every frame. The frame is further divided into smaller blocks of.. 
...sampling, preprocessing, and preemphasis); 

sw)) Weighted speech vector; 

sO)) Zero-input response of weighted synthesis filter; 

sp)) Down-sampled pre-processed signal; Oversampled synthesized speech signal; 

s' Synthesis signal before deemphasis x Target vector for pitch search; 

x' Target vector for innovation search; 

h Weighted synthesis filter impulse response; 

vT)) Adaptive (pitch) codebook vector at delay T; 

yT)) Filtered pitch codebook vector (vT)) convolved with h); 

ck)) Innovative codevector at index k (k-th codebook index); 

b Pitch gain (or pitch codebook gain); 

j Index of the low-pass filter used on the pitch codevector; 

...optional pre-processing block 102. Pre-processing block 102 may consist of a high-pass filter with a 50 Hz cut-off 
frequency. High-pass filter 102 removes the unwanted sound components below 50 Hz. 

The down-sampled pre-processed signal at a sampling frequency of 12.8 kHz). In a preferred embodiment of the 

preemphasis filter 103, the signal Sp))(n) is preemphasized using a filter having the following transfer function: where 
(mu) is a preemphasis factor with a value located between 0 and 1 (a typical value is (mu) = 0.7). A higher-order filter 
could also be used. It should be pointed out that high-pass filter 102 and preemphasis filter 103 can be interchanged to 
obtain more efficient fixed-point implementations. 

The function of the preemphasis filter 103 is to enhance the high frequency contents of the input signal. It also 
reduces quality. This will be explained in more detail herein below. 

The output of the preemphasis filter 103 is denoted s(n). This signal is used for performing LP analysis in calculator... 
...are computed from the windowed signal, and Levinson-Durbin recursion is used to compute LP filter coefficients, 

ai)), where i=l,...,p, and where p is the LP order, which is wideband coding. The parameters ai)) are the coefficients 

of the transfer function of the LP filter, which is given by the following relation: 

LP analysis is performed in calculator module 104, which also performs the quantization and interpolation of the LP 
filter coefficients. The LP filter coefficients are first transformed into another equivalent domain more suitable for 

quantization and interpolation purposes are two domains in which quantization and interpolation can be efficiently 

performed. The 16 LP filter coefficients, ai)), can be quantized in the order of 30 to 50 bits using split or a 

combination thereof The purpose of the interpolation is to enable updating the LP filter coefficients every subframe 
while transmitting them once every frame, which improves the encoder performance without increasing the bit rate. 
Quantization and interpolation of the LP filter coefficients is believed to be otherwise well known to those of ordinary 

skill in the rest of the coding operations performed on a subframe basis. In the following description, the filter A(z) 

denotes the unquantized interpolated LP filter of the subframe, and the filter A(z) denotes the quantized interpolated 
LP filter of the subframe. 

Perceptual Weighting: 

In analysis-by-synthesis encoders, the optimum pitch and innovation and weighted synthesis speech. 
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The weighted signal sw))(n) is computed in a perceptual weighting filter 105. Traditionally, the weighted signal 
sw))(n) is computed by a weighting filter having a transfer function W(z) in the form: where 

As well known to those W-l)(z), which is the inverse of the transfer function of the perceptual weighting filter 105. 

This result is well described by B.S. Atal and M.R. Schroeder in... is controlled by the factors (gamma)l)) and 
(gamma)2)). 

The above traditional perceptual weighting filter 105 works well with telephone band signals. However, it was found 
that this traditional perceptual weighting filter 105 is not suitable for efficient perceptual weighting of wideband 
signals. It was also found that the traditional perceptual weighting filter 105 has inherent limitations in modelling the 

formant structure and the required spectral tilt concurrently range between low and high frequencies. The prior art 

has suggested to add a tilt filter into W(z) in order to control the tilt and formant weighting of the wideband... 
...solution to this problem is, in accordance with the present invention, to introduce the preemphasis filter 103 at the 
input, compute the LP 

filter A(z) based on the preemphasized speech s(n), and use a modified filter W(z) by fixing its denominator. 

LP analysis is performed in module 104 on the preemphasized signal s(n) to obtain the LP filter A(z). Also, a new 
perceptual weighting filter 105 with fixed denominator is used. An example of transfer function for the perceptual 
weighting filter 104 is given by the following relation: where 

A higher order can be used at z) is computed based on the preemphasized speech signal s(n), the tilt of the filter 

l/A(z/(gamma)l ))) is less pronounced compared to the case when A(z based on the original speech. Since 

deemphasis is performed at the decoder end using a filter having the transfer function: the quantization error spectrum 

is shaped by a filter having a transfer function W-l)(z)P-l)(z). When (gamma)2)) is set which is typically the case, 

the spectrum of the quantization error is shaped by a filter whose transfer function is l/A(z/(gamma)l))), with A(z) 

computed based on this structure for achieving the error shaping by a combination of preemphasis and modified 

weighting filtering is very efficient for encoding wideband signals, in addition to the advantages of ease of... 
...computed. This is usually done by subtracting the zero-input response sO)) of weighted synthesis filter W(z)/A(z) 

from the weighted speech signal sw)) (n). This zero-input response the weighted speech vector in the subframe, and 

sO)) is the zero-input response of filter W(z)/A(z) which is the output of the combined filter W(z)/A(z) due to its initial 
states. The zero-input response calculator 108 is responsive to the quantized interpolated LP filter A(z) from the LP 
analysis, quantization and interpolation calculator 104 and to the initial states of the weighted synthesis filter 

W(z)/A(z) stored in memory module 1 11 to calculate the zero-input response due to the initial states as determined 

by setting the inputs equal to zero) of filter W(z)/A(z). This operation is well known to those of ordinary skill in the 

target vector x. 

A N-dimensional impulse response vector h of the weighted synthesis filter W(z)/A(z) is computed in the impulse 
response generator 109 using the LP filter coefficients A(z) and A(z) from module 104. Again, this operation is well 

known impulse response vector h and the open-loop pitch lag TOL)) as inputs. Traditionally, the pitch prediction 

has been represented by a pitch filter having the following transfer function: where b is the pitch gain and T is the... 
...new sample). For pitch lags T>N, the pitch codebook is equivalent to the filter structure (1/(1 -bz-T)) , and an 
pitch codebook vector vT))(n) at pitch lag ...from the past excitation until the vector is completed (this is not equivalent 
to the filter structure). 

In recent encoders, a higher pitch resolution is used which significantly improves the quality voiced sound 

segments. This is achieved by oversampling the past excitation signal using polyphase interpolation filters. In this case, 

the vector vT))(n) usually corresponds to an interpolated version of the minimize the mean squared weighted error 

E between the target vector x and the scaled filtered past excitation. Error E being expressed as: where yT)) is the 
filtered pitch codebook vector at pitch lag T: n=0,...,N-l . 

It can be shown 5), which significantly simplifies the search procedure. A simple procedure is used for updating the 

filtered codevector yT)) without the need to compute the convolution for every pitch lag. 

Once an the search (module 107) tests the fractions around that optimum integer pitch lag. 

When the pitch predictor is represented by a filter of the form 1/(1 -bz-T)), which is a valid assumption for pitch lags 
T>N, the spectrum of the pitch filter exhibits a harmonic structure over the entire frequency range, with a harmonic 

frequency related to to achieve efficient representation of the pitch contribution in voiced segments of wideband 

speech, the pitch prediction filter needs to have the flexibility of varying the amount of periodicity over the wideband 

spectrum of wideband signals is disclosed in the present specification, whereby several forms of low pass filters are 

applied to the past excitation and the low pass filter with higher prediction gain is selected. 

When subsample pitch resolution is used, the low pass filters can be incorporated into the interpolation filters used to 
obtain the higher pitch resolution. In this case, the third stage of the fractions around the chosen integer pitch lag are 
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tested, is repeated for the several interpolation filters having different low-pass characteristics and the fraction and 
filter index which maximize the search criterion C are selected. 

A simpler approach is to complete three stages described above to determine the optimum fractional pitch lag using 

only one interpolation filter with a certain frequency response, and select the optimum low-pass filter shape at the end 
by applying the different predetermined low-pass filters to the chosen pitch codebook vector vT)) and select the low- 
pass filter which minimizes the pitch prediction error. This approach is discussed in detail below. 

Figure 3 illustrates a schematic block diagram vector vT)) corresponds to the interpolated past excitation signal. In 

this preferred embodiment, the interpolation filter (in module 301, but not shown) has a low-pass filter characteristic 
removing the frequency contents above 7000 Hz. 

In a preferred embodiment, K filter characteristics are used; these filter characteristics could be low-pass or band-pass 
filter characteristics. Once the optimum codevector vT)) is determined and supplied by the pitch codevector generator 
302, K filtered versions of vT)) are computed respectively using K different frequency shaping filters such as 305(j)), 
where j=l, 2, ... , K. These filtered versions are denoted vf))G)), where j^l,2, .„ , K. The different vectors VOXj- 
...response h to obtain the vectors y(j)), where j=0, 1, 2, ... , K. To calculate the mean squared pitch prediction error 
for each vector y{j)), the value y(j)) is multiplied by the gain filter 305(j)) which minimizes the mean squared pitch 
prediction error j=l, 2,...,IC 

To calculate the mean squared pitch prediction error e()j))) for each value of yG)), the value y(j)) is multiplied is 

calculated in a corresponging gain calculator 306(j)) in association with the frequency shaping filter at index j, using 
the following relationship: 

In selector 309, the parameters b, T, and j are chosen based on vT)) or vf))(j)) which minimizes the mean squared pitch 
prediction error e. 

Referring back to Figure 1, the pitch codebook index T is encoded and approach, extra information is needed to 

encode the index j of the selected frequency shaping filter in multiplexer 1 12. For example, if three filters are used 
0=0, I, 2, 3), then two bits are needed to represent this information. The filter index information j can also be encoded 
jointly with the pitch gain b. 

Innovative codebook by subtracting the LTP contribution: where b is the pitch gain and yT)) is the filtered pitch 

codebook vector (the past excitation at delay T filtered with the selected low pass filter and convolved with the inpulse 
response h as described with reference to Figure 3). 

The gain g which minimize the mean-squared error between the target vector and the scaled filtered codevector 

where H is a lower triangular convolution matrix derived from the impulse response vector channel. 

Memory update: 

In memory module 1 1 1 (Figure 1), the states of the weighted synthesis filter W(2)/A(z) are updated by filtering the 
excitation signal u - gck)) + bvT)) through the weighted synthesis filter. After this filtering, the states of the filter are 

memorized and used in the next subframe as initial states for computing the zero known to those of ordinary skill in 

the art can be used to update the filter states. 

DECODER SIDE 

The speech decoding device 200 of Figure 2 illustrates the various steps scaled codevector gck)) at the output of the 

amplifier 224 is processed through a innovation filter 205. 

Periodicity enhancement: 

The generated scaled codevector at the output of the amplifier 224 is improves the quality in case of voiced 

segments. This was done in the past by filtering the innovation vector from the innovative codebook (fixed codebook) 

218 through a filter in the form l/(l-(epsilon)bz-T)) where (epsilon) is a factor below 0 which is part of the present 

invention, is disclosed whereby periodicity enhancement is achieved by filtering the innovative codevector ck)) from 
the innovative (fixed) codebook through an innovation filter 205 (F(z)) whose frequency response emphasizes the 
higher frequencies more than lower frequencies. The to derive the filter F(z) coefficients used in a preferred 

embodiment, is to relate them to the amount where higher frequencies are more strongly emphasized (stronger 

overall slope) for higher pitch gains. Innovation filter 205 has the effect of lowering the energy of the innovative 

codevector ck)) at low the excitation signal u at lower frequencies more than higher frequencies. Suggested forms 

for innovation filter 205 are 

(l)or 
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(2) where (sigma) or (alpha) are periodicity factors derived from the pitch codevector vT)) from the pitch codebook 

201 is then processed through a low-pass filter 202 whose cut-off frequency is adjusted by means of the index j from 
the periodicity factor (sigma) is calculated as follows: 

The enhanced signal cf)) is therefore computed by filtering the scaled innovative codevector gck)) through the 
innovation filter 205 (F(z)). 

The enhanced excitation signal u' is computed by the adder 220 as and the enhanced excitation signal u' is used at 

the input of the LP synthesis filter 206. 

Synthesis and deemphasis 

The synthesized signal s' is computed by filtering the enhanced excitation signal u* through the LP synthesis filter 206 
which has the form 1/A(z), where A(z) is the interpolated LP filter in the current subframe. As can be seen in Figure 2, 
the quantized LP coefficients A(z) on line 225 from demultiplexer 217 are supplied to the LP synthesis filter 206 to 
adjust the parameters of the LP synthesis filter 206 accordingly. The deemphasis filter 207 is the inverse of the 
preemphasis filter 103 of Figure 1 . The transfer function of the deemphasis filter 207 is given by where (mu) is a 
preemphasis factor with a value located between 0 and 1 (a typical value is (mu) = 0.7). A higher-order filter could also 
be used. 

The vectors' is filtered through the deemphasis filter D(z) (module 207) to obtain the vector sd)), which is passed 
through the high-pass filter 208 to remove the unwanted frequencies below 50 Hz and further obtain sh)). 

Oversampling and it with the same LP synthesis filter used for synthesizing the down-sampled signal S(sup AND). 

The high frequency generation procedure speech domain using the spectral shaper 215. In the preferred 

embodiment, this is achieved by filtering the noise wg)) through a bandwidth expanded version of the same LP 
synthesis filter used in the down-sampled domain (l/A(z/0.8)). The corresponding bandwidth expanded LP filter 
coefficients are calculated in spectral shaper 21 5. 

The filtered scaled noise sequence wf)) is then band-pass filtered to the required frequency range to be restored using 
the band-pass filter 216. In the preferred embodiment, the band-pass filter 216 restricts the noise sequence to the 
frequency range 5.6-7.2 kHz. The resulting band-pass filtered noise sequence z is added in adder 221 to the 
oversampled synthesized speech signal s... 

Claims: ...in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view 
of synthesizing a wideband signal, said periodicity enhancing device comprising: 

a) a factor 204) for calculating a periodicity factor related to the wideband signal; and 

b) an innovation filter (205) for filtering the innovative codevector in relation to said periodicity factor to thereby 
reduce energy of a low frequency portion of the innovative codevector and enhance periodicity of a low frequency 
portion of the innovative codevector. 

3. A periodicity enhancing device as defined in claim 1, wherein said innovation filter has a transfer function of the 
form: where (alpha) is a periodicity factor derived from innovative codevector. 

7. A periodicity enhancing device as defined in claim 1, wherein said innovation filter has a transfer function of the 

form: where (sigma) is a periodicity factor derived from in relation to a pitch codevector and an innovative 

codevector for supplying a signal synthesis filter in view of synthesizing a wideband signal, said periodicity enhancing 
method comprising the steps of: 

a) calculating a periodicity factor related to the wideband signal; and 

b) filtering the innovative codevector in relation to said periodicity factor to thereby reduce energy of a low frequency 

portion of the innovative codevector and enhance periodicity of a low frequency portion of the innovative 

codevector. 

1 3. A method for enhancing periodicity as defined in claim 10, wherein said filtering comprises processing the 
innovation vector through an innovation filter having a transfer function of the form: where (alpha) is a periodicity 
factor derived from innovative codevector. 

1 7. A method for enhancing periodicity as defined in claim 1 1 , wherein said filtering comprises processing the 
innovation vector through an innovation filter having a transfer function of the form: where (sigma) is a periodicity 
factor derived from pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients; 
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b) an pitch codebook responsive to said pitch codebook parameters for producing a pitch factor generator for 

calculating a periodicity factor related to the wideband signal, and said innovation filter for filtering the innovative 
codevector; 

e) a combiner circuit for combining said pitch codevector and said innovative codevector filtered by said innovation 
filter to thereby produce said periodicity enhanced excitation signal; and 

f) a signal synthesis filter for filtering said periodicity enhanced excitation signal in relation to said synthesis filter 
coefficients to thereby produce said synthesized wideband signal. 

22. A decoder for producing a synthesized decoder for producing a synthesized wideband signal as defined in claim 

21, wherein said innovation filter has a transfer function of the form: where (alpha) is a periodicity factor derived 

from decoder for producing a synthesized wideband signal as defined in claim 21, wherein said innovation filter 

has a transfer function of the form: where (sigma) is a periodicity factor derived from from said encoded wideband 

signal at least pitch codebook parameters, innovative codebook parameters, and synthesis filter coefficients; 

b) a pitch codebook responsive to said pitch codebook parameters for producing a pitch codevector and innovative 

codevector to thereby produce an excitation signal; and 

e) a signal synthesis filter for filtering said excitation signal in relation to said synthesis filter coefficients to thereby 

produce said synthesized wideband signal; the decoder further comprising a periodicity enhancing factor generator 

for calculating a periodicity factor related to the wideband signal, and said innovation filter for filtering the innovative 
codevector. 

32. A decoder for producing a synthesized wideband signal as defined in decoder for producing a synthesized 

wideband signal as defined in claim 3 1 , wherein said innovation filter has a transfer function of the form: where 

(alpha) is a periodicity factor derived from decoder for producing a synthesized wideband signal as defined in claim 

31, wherein said innovation filter has a transfer function of the form: where (sigma) is a periodicity factor derived 
from... innovative codevector. 

43. A cellular communication system as defined in claim 41, wherein said innovation filter has a transfer function of 
the form: where (alpha) is a periodicity factor derived from innovative codevector. 

47. A cellular communication system as defined in claim 41, wherein said innovation filter has a transfer function of 

the form: where (sigma) is a periodicity factor derived from 53. A cellular mobile transmitter/receiver unit as 

defined in claim 5 1 , wherein said innovation filter has a transfer function of the form: where (alpha) is a periodicity 

factor derived from 57. A cellular mobile transmitter/receiver unit as defined in claim 51, wherein said innovation 

filter has a transfer function of the form: where (sigma) is a periodicity factor derived from... innovative codevector. 

63. A cellular network element as defined in claim 61, wherein said innovation filter has a transfer function of the 
form: where (alpha) is a periodicity factor derived from innovative codevector. 

67. A cellular network element as defined in claim 61, wherein said innovation filter has a transfer function of the 

form: where (sigma) is a periodicity factor derived from 73. A bidirectional wireless communication sub-system as 

defined in claim 71, wherein said innovation filter has a transfer function of the form: where (alpha) is a periodicity 

factor derived from 77. A bidirectional wireless communication sub-system as defined in claim 71, wherein said 

innovation filter has a transfer function of the form: where (sigma) is a periodicity factor derived from... 

Claims: ...eines Period izitatsfaktors, der mit dem Breitbandsignal in Beziehung steht; und 

b) ein Innovationsfilter (205) zum Filtern des innovativen Codevektors in bezug auf den Periodizitatsfaktor, um 
dadurch die Energie eines niederfrequenten Abschnitts Schritte umfast: 

a) Berechnen eines Period izitatsfaktors, der mit dem Breitbandsignal in Beziehung steht; und 

b) Filtern des innovativen Codevektors in bezug auf den Periodizitatsfaktor, um dadurch die Energie eines 
niederfrequenten Abschnitts innovativen Codevektor umfast. 

13. Verfahren zum Verbessem der Periodizitat nach Anspruch 1 1, bei dem die Filterung das Verarbeiten des 
Innovationsvektors durch ein Innovationsfilter umfast, das eine Ubertragungsfunktion der folgenden Form besitzt... 
...innovativen Codevektors ist. 

1 7. Verfahren zum Verbessem der Periodizitat nach Anspruch 1 1, bei dem die Filterung die Verarbeitung des 
Innovationsvektors durch ein Innovationsfilter umfast, das eine Ubertragungsfunktion der folgenden Form besitzt... 
...Faktorgenerator zum Berechnen eines mit dem Breitbandsignal in Beziehung stehenden Periodizitatsfaktors und das 
Innovationsfilter zum Filtern des innovativen Codevektors umfast; 
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e) eine Kombinationsschaltung zum Kombinieren des Tonhohen-Codevektors und des innovativen 
Signalsynthetisierungsfilter zum Filtern des Erregungssignals mit verbesserter Periodizitat in bezug auf die 

Synthetisierungsfilter-Koeffizienten, urn dadurch das synthetisierte und des innovativen Codevektors, urn dadurch 

ein Erregungssignal zu erzeugen; und 

e) ein Signalsynthetisierungsfilter zum Filtern des Erregungssignals in bezug auf die Synthetisierungsfilter- 

Koeffizienten, urn dadurch das synthetisierte Breitbandsignal zu erzeugen die einen Faktorgenerator zum 

Berechnen eines auf das Breitbandsignal bezogenen Periodizitatfaktors und das Innovationsfllter zum Filtern des 
innovativen Codevektors umfast. 

32. Decodierer zum Erzeugen eines synthetisierten Breitbandsignals nach Anspruch 3 1 , bei ... 
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Specification: ...analysis is made to generate, besides the low frequency bandwidth signal, parameters relating to the 
high frequency bandwidth energy contents and to the original voice signal spectral characteristics. 

RELP methods enable reproducing speech signal and parameters characterizing the high frequency bandwidth 

components of said voice signal said parameters including energy indications about said high frequency bandwidth 

signal, said voice coding process being further characterized in that and a synthesizer. In the analyzer the input 

speech signal is processed to derive therefrom the following set of speech descriptors: 

(I) the spectral descriptors represented by a set of linear prediction parameters, (see LP Analysis in Fig. 1). 

(II) the base-band signal obtained by band limiting (300-1000 Hz) and subsequently sub-sampling at 2kHz the residual 
(or excitation) signal resulting from the inverse filtering of the speech signal by its predictor (see BB Extraction in 
Fig. 1 ) or by a conventional low frequency Tiltering operation. 

(III) the energy of the upper band (or High-Frequency band) signal (1000 to 3400 Hz) which has been removed from 
the excitation signal by low-pass filtering (see HF Extraction and Energy Computation). 

These speech descriptors are quantized and multiplexed to generate the coded speech data to be provided to the speech 
synthesizer whenever the speech signal needs be reconstructed. 

The synthesizer is made to perform the following operations: 

- decoding and up-sampling to 8kHz the Base-Band signal(see BB Decode in Fig.l) 

- generating a high frequency signal (1000-3400 Hz) by non-linear distorsion high-pass filtering and energy adjustment 
of the base-band signal (see Non Linear Distortion HP Filtering and Energy Adjustment) 

- exciting an all-pole prediction filter corresponding the vocal tract by the sum of the base-band signal and of the I) 

and (II) are separately coded. But the third speech descriptors (III) derived through analysis of the high and low 
frequency bandwidth contents, differs from the descriptor (III) of a conventional RELP as represented in figure... 
...upper-band) (Fig.3d) signals. 

The problem faced with RELP vocoders is to derive at the receiver end (synthesizer) a synthetic high-frequency signal 

from the transmitted base-band signal. As making a non-linear distortion of the base-band signal followed by a 

high-pass filtering and a level adjustment according to the transmitted energy. The signal obtained through these 
operations... pulses of the base-band pulse train. 

The upper-band signal y(n) is then modulated by the windowing signal w(n-K). 

(8) yC')(n) = y(n).w(n-K the upper band according to the pulse/noise model in device (15), as represented in Fig.9. 

This high-frequency signal s(n) is then added to the delayed base-band signal to obtain the excitation signal of the 
predictor filter to be used for performing the LP Synthesis function of Fig.2. 

Fig.9 shows.. .each pitch period so as to improve the periodicity of the full high-band signal s(n). This reset is achieved 
by the shifted pulse train z(n-K). 

The pulse and noise signal components are then summed up and filtered by a high-pass filter 19 which removes the (0- 
lOOOHz) of the upper-band signal s(n). Note on Fig.5 that the delay introduced by the high-pass filter on the high- 
frequency band is compensated by a delay (20) on the base-band... 

Claims: ...and a high frequency (HE) bandwidth to be coded separately, said process comprising : - coding said low 
frequency bandwidth signal ; - processing said high frequency bandwidth signal to derive therefrom high frequency 
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energy information shifting said low frequency bandwidth decoded data using said phase shift information ; - 

combining said shifted low frequency decoded data with said high frequency energy data to derive therefrom a 

synthesized upper band signal ; and - adding said process including : - demultiplexing and decoding said linear 

parameters ; - using said decoded linear prediction parameters to adjust a synthesis filter fed with the signal provided 
by said adding operation. 

8. A coding process according to... 14, wherein said upper band analysis means include : 



- windowing means sensitive to said shifted pulse train and to said pitch M to derive therefrom a w(n-k) train ; 

- modulating means sensitive to said w i) and z(n-K) to derive s(n) ; 

- summing means for summing said upper band train s(n) and a delayed x(n) train ; 

- LP synthesis filter tuned by said decoded LP parameters and sensitive to the output of said summing means a 

noise signal component e'(n) = e(n).E( sup(l/2) ; 

- adding means for adding said noise component to said pulse signal component ; and, 

- high pass filter connected to said adding means to provide said s(n). 

Claims: ...umfast: Demultiplexen und Decodieren der linearen Parameter, Verwenden der decodierten linearen 
Pradiktionsparameter, um ein Synthese-Filter einzustellen, das mit dem von dem Addierarbeitsgang gelieferten Signal 
gespeist wird. 

8. Verfahren zum Codieren... 
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Detailed Description: 

...a predetermined number corresponding typically 

i 5 to 10-30 ms. A linear prediction (LP) filter is computed and transmitted every frame. The computation of the LP 

filter typically needs a lookahead, a 5-15 ms speech segment from the subsequent frame. The the decoder, where 

the reconstructed excitation signal is used as the input of the LP filter 

As the main applications of low bit rate speech encoding are wirel ess 

mobile communication telephone network) that uses the legacy narrow band speech signals. 

The adaptive codebook, or the pitch predictor, in CELP plays an 

important role in maintaining high speech quality at low bit rates be missing from the adaptive codebook content. 

This will have a severe effect on the pitch predictor in consequent good frames, resulting in long time before the 
synthesis signal converge to the the AMR-WB encoder of Figure 

2, wherein, the down-sampler module, the high-pass filter module and the preemphasis filter module have been 

grouped in a single pre-processing module, .and wherein the closed-loop optional pre-processing module 

202. Pre-processing module 202 may consist of a high-pass filter with a 50 Hz cut-off frequency. High-pass filter 202 
removes the unwanted sound components below 50 Hz. 

The down-sampled, pre-processed signal at a sampling frequency of 12.8 

kHz). In an illustrative embodiment of the preemphasis filter 203, the signal sp(n) is preemphasized using a filter 
having the following transfer function. 

P(Z) = 1- Yz-1 

where p is a preemphasis 0 and I (a typical value is p = 0.7). The function of the preemphasis filter 203 is to 

enhance the high frequency contents of the input speech signal. It also quality. This will be explained in more detail 

herein below. 

The.output of the preemphasis filter 203 is denoted s(n). This signal is • 

used for performing LP analysis in module are computed from the windowed signal, and Levinson-Durbin 

recursion is used to compute LP filter coefficients, ai, where i@~ I 1... p, and where p is the LP order, which The 

parameters ai are the coefficients of the transfer function A(z) of the LIP filter, which is given by the following 
relation. 
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p 

A(z) = I + a iz 

LP analysis is performed in module 204, which also performs the 

quantization and interpolation of the LP filter coefficients. The LP filter coefficients 

are first transformed into another equivalent domain more suitable for 

quantization and interpolation purposes are two domains in which quantization and interpolation can be efficiently 

performed. The 16 LID filter coefficients, ab can be quantized in the order of 30 to 50 bits using split or a 

combination thereof The purpose of the interpolation is to enable updating the LP filter coeflTicients every subframe 
while transmitting them once every frame, which improves the encoder performance without increasing the bit rate. 
Quantization and interpolation of the LP filter coefficients is believed to be otherwise well known to those of ordinary 

skill in the 64 samples at the sampling frequency of 12.8 kHz). In the following description, the filter A(z) denotes 

the unquantized interpolated LP filter of the subframe, and the filter A(z) denotes the quantized interpolated LFfilter 

of the subframe. The filter AW is supplied every subframe to 1 0 a multiplexer 213 for transmission through a 1 5 

weighted domain. The weighted signal sw(n) is computed in a perceptual weighting filter 205 in response to the signal 
s(n) from the pre-emphasis filter 203. A perceptual weighting filter 205 with fixed denominator, suited for wideband 
signals, is used. An example of transfer function for the perceptual weighting filter 205 is given by the following 
relation. 

W(z) = A(z/y, )/{ I - Y2Z-') where computed. This is usually done by subtracting the zero-input response so of 

weighted synthesis filter W(z)I A(z) from the weighted speech signal sw(n). This zero-input response calculated by 

a zero-input response calculator 208 in response to the quantized interpolation LP filter A(z) from the LP analysis, 
quantization and interpolation module 204 and to the initial states of the weighted synthesis filter W(z)IA(z) stored in 
memory update module 21 1 in response to the LP filters A(z) and A(z), and the excitation vector u. This operation is 
well known not be further described. 

A N-dimensional impulse response vector h of the weighted synthesis filter 
W(z)! A(z) is computed in the impulse response generator 209 using the 

coefficients of the LP filter A(z) and A(z) from module 204. Again, this operation is well known to finding the best 

pitch lag T and gain b that 

minimize a mean squared weighted pitch prediction error, for example 
e ')=IIx^b l)y •)ir where j-- 11 21 ... I k 

between the target vector x and a scaled filtered version of the past excitation. 

More specifically, in the present illustrative implementation, the pitch (pitch 15), which significantly simplifies the 

search procedure. A simple procedure is used for updating the filtered codevector YT (this vector is defined in the 

following description) without the need to compute spectrum. This is achieved by processing the pitch codevector 

through a plurality of frequency shaping filters (for example low-pass or band-pass filters). And the frequency shaping 
filter that minimizes the mean-squared weighted error eu) is selected. The selected ft*equency shaping filter is 
identified by an indexj. 

The pitch codebook index T is encoded and transmitted to 1 0 

x'= X - bYT 

where b is the pitch gain and YT is the filtered pitch codebook vector (the past excitation at delay T filtered with the 
selected frequency shaping filter (index J) filter and convolved with the impulse response h). 

The innovative excitation search procedure in CELP is which minimize the mean-squared error E between the 

target vector x" and a scaled filtered version of the codevector ck, for example. 

E-Jjx@-gHck 112 

where H is a codebook is a dynamic 

codebook consisting of an algebraic codebook followed by an adaptive pre-filter 
F(2) which enhances special spectral components in order to improve the 

synthesis speech quality excitation signal u improves the quality of voiced segments. The periodicity enhancement 

is achieved by filtering the 

innovative codevector ck from the innovation (fixed) codebook through an 

innovation filter F(z) (pitch enhancer 305) whose frequency response emphasizes the higher frequencies more than the 
lower frequencies. The coefficients of the 5 innovation filter F(z) are related to the amount of periodicity in the 
excitation signal u. 

An efficient, illustrative way to derive the coefficients of the innovation filter F(z) is to relate them to the amount of 
pitch contribution in the total higher frequencies are more strongly emphasized (stronger overall slope) for higher 
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pitch gains. The innovation filter 305 has the effect of lowering the energy of the innovation codevector ck at lower... 
...signal u at lower frequencies more than higher frequencies. A 
suggested form for the innovation filter 305 is the following. 

FW = -az + I - ciz 

where a is a periodicity factor derived to produce a pitch codevector. 

The pitch codevector is then processed through a low-pass filter 302 whose cutoff frequency is selected in relation to 
index j from the demultiplexer 3 1 7 to produce the filtered pitch codevector vT. Then, the filtered pitch -codevector vT 
is then amplified by the pitch gain b by an amplifier 326 and 0.25 for purely voiced signals. 

The enhanced signal cf is therefore computed by filtering the scaled 
innovative codevector gck through the innovation filter 305 (F(z)). 

The enhanced excitation signal W is computed by the adder 320 as and the enhanced excitation signal W is used at 

the input of the LIP synthesis filter 306. 

The synthesized signal s'is computed by filtering the enhanced excitation signal W through the LP synthesis filter 306 
which has the form 1 1 A(z), where A(z) is the quantized, interpolated LP filter in the current subframe. As can be seen 

in Figure 3, the quantized, interpolated LP A(z) on line 325 from the demultiplexer 3 17 are supplied to the LIP 

synthesis filter 306 to adjust the parameters of the LP synthesis filter 306 accordingly. The deemphasis filter 307 is the 
inverse of the preemphasis filter 203 of Figure 2. The transfer function of 
the deemphasis filter 307 is given by 
D(z)- 1/(1 -l-iz-') 

where p is a preemphasis located between 0 and 1 (a typical value is p = 0.7). A higher-order filter could also be 

used. 

The vector sy is filtered through the deemphasis filter D(z) 307 to obtain the vector sd, which is processed through the 
high-pass filter 308 to remove the unwanted frequencies below 50 Hz and further obtain sh. 

The oversampler 310 and requires input from voicing factor generator 304 (Figure 3). 

The resulting band-pass filtered noise sequence z from the high frequency 

generation module 310 is added by the adder Parameter Bits I Frame 

LP Parameters 46 
Pitch Delay 30= 9 + 6 -f 9 + 6 
Pitch Filtering 4= 1 + I + 1 + 1 
Gains 28= 7 + 7 + 7 + 7 

Algebraic Codebook 144= 36...desynchronized from the encoder. The main reason is that low bit rate encoders rely on 
pitch prediction, and during erased frames, the memory of the pitch predictor is no longer the same as the one at the 
encoder. The problem is amplified the AMR-WB encoder 400. In 

this simplified block diagram, the downsampler 201, high-pass filter 202 and preemphasis filter 203 are grouped 
together in the preprocessing module 401 . 

Also, the closed-loop search module In the present illustrative embodiment, the spectral tilt is estimated as a ratio 

between the energy concentrated in low frequencies and the energy concentrated in high frequencies. However, it can 

also be estimated in different ways such as bands have been excluded from the computation to improve the 

discrimination between frames with high 

energy concentration in low frequencies (generally voiced) and with high energy concentration in high frequencies 
(generally unvoiced). In between, the for any of the classes and would increase the decision confusion. 

In module 500, the energy in low frequencies is computed differently for 

long pitch periods and short pitch periods. For voiced female speech to the nearest harmonics are taken into 

account. Hence, if the stmcture is harmonic in low frequencies, only high energy term will be included in the sum. On 
the other hand, if the structure is will be 

random and the sum will be smaller. Thus even unvoiced sounds with high 

energy content in low frequencies can be detected. This processing cannot be done for longer pitch periods, as the 
frequency and also for a priori unvoiced sounds (i.e. 

when rx+re<0.6), the low frequency energy estimation is done per critical band 
and is computed as 
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9 

El e(i) 

10 of the weighted speech signal sw(n) of the current frame from the perceptual weighting filter 205 and Ee is the 

energy of the error between this weighted speech signal and the weighted synthesis signal of the current frame from the 
perceptual weighting filter 205'. 

1 0 

The pitch stability counter PC assesses the variation of the pitch period be 

used at the decoder to help the classification, as the parameters of the LP filter or the pitch stability. 

In case of source-controlled variable bit rate coder, the information if the energy control is most important for 

voiced speech because of the long term prediction (pitch prediction), it is important also for unvoiced speech. The 

reason here is the prediction of the domain has the disadvantage of not taking into account the influence of the LP 

synthesis filter. This can be particularly tricky in the case of voiced recovery after several lost voiced is typically 

used during the concealment with some attenuation strategy. When a new LIP synthesis filter arrives with the first 
good frame after the erasure, there can be a mismatch between the excitation energy and the gain of the LIP synthesis 
filter. The new synthesis filter can produce a synthesis signal with an energy highly different from the energy of 
the... obtained when the position of the first glottal pulse is measured on the low-pass filtered residual signal. 

The position of the first glottal pulse is coded using 6 bits in be however easily applied to 

any speech codec where the synthesis signal is generated by filtering an 

excitation signal through an LP synthesis filter. The concealment strategy can be summarized as a convergence of the 

signal energy and the attenuation factor a. The factor a is further dependent on the stability of the LP filter for 

UNVOICED frames. In general, the convergence is slow if the last good received frame A stability factor 0 is 

computed based on a distance measure between the 

adjacent LP filters. Here, the factor 0 is related to the ISF (Immittance Spectral Frequencies) distance measure and... 
... 1 St erased frame after a good frame, this pitch pulse is first low-pass filtered. The filter used is a 5 simple 3-tap 
linear phase FIR filter with filter coefficients equal to- 0. 1 8, 0.64 and 
If a voicing information is available, the filter can be also selected 
dynamically with a cut-off frequency dependent on the voicing. 

The correctly received or non erased) received frame is different from UNVOICED, the innovation excitation is 

filtered through a linear phase FIR high-pass filter with coefficients 01-25, 109, 0.7813, 109, 
To decrease the amount of noisy components during voiced segments, 

these filter coefficients are multiplied by an adaptive factor equal to (0.75 - 0.25 rv )7.!. ...is available. 

Spectral Envelope Concealment, Synthesis and updates 
. To synthesize the decoded speech, the LIP filter parameters must be 

obtained. The spectral envelope is gradually moved to the estimated envelo pe ISF of the estimated comfort noise 

envelope and p is the order of the LP filter. 

The synthesized speech is obtained by filtering the excitation signal 

through the LIP synthesis filter. The filter coefficients are computed from the ISF representation and are interpolated 

for each subframe (four (4 are using the past excitation signal to encode the present frame excitation (long-term or 

pitch prediction). Also, most of the quantizers (ILP quantizers, gain quantizers) make use of a prediction. 

Artificial lost onset, the periodic part of the excitation is constructed artificially as, a low-pass filtered periodic 

train of pulses separated by a pitch period. In the present illustrative embodiment, the low-pass filter is a simple linear 
phase FIR filter with the impulse response blow = @ 01 25, 0.109, 0.7813, 0.109, 01251. However, 
the filter could be also selected dynamically with a cut-off frequency 

corresponding to the voicing information pitch periods of all subframes where the artificial onset reconstruction is 

used. 

The low-pass filtered impulse train is realized by placing the impulse responses of the low-pass filter in the adaptive 

excitation buffer (previously initialized to zero), The first impulse response will be defined in Equations 16 and 17) 

and divided by the gain of the LIP synthesis filter. The LP synthesis filter gain is computed as. 

9LPh2(i) 
i=0(31) 

where h(i) is the LP synthesis filter impulse response. Finally, the artificial onset gain is reduced by multiplying the 
periodic part with of the artificial onset and the regular CELP decoding could be used instead. 
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The LIP filter for the output speech synthesis is not interpolated in the case of an artificial onset frame is typically 

used during the concealment with some attenuation strategy. When a new LIP filter arrives with the first good frame 
after the erasure, there can be a mismatch between the excitation energy and the gain of the new LIP synthesis filter. 
The new synthesis filter can produce a synthesis signal with an energy highly different from the energy of the... 
...implementation. 

Conducting frame erasure concealment and decoder recovery comprises, 

when a gain of a LIP filter of a first non erased frame received following frame erasure is higher than a gain of a LP 
filter of a last frame erased during said frame erasure, adjusting the energy of an LP filter excitation signal produced in 
the decoder during the received first non erased frame to a gain of the LIP filter of said received first non erased frame 
using the following relation. 

If Eq cannot be be taken because 

of the possible mismatch, between the excitation signal energy and the LIP filter gain, mentioned previously. A 
particularly dangerous situation arises when the gain of the LIP filter of a first non erased frame received following 
frame erasure is higher than the gain of the LP filter of a last frame erased during that frame erasure. In that particular 
case„the energy of the LP filter excitation signal produced in the decoder during the received first non erased frame is 
adjusted to a gain of the LIP filter of the received first non erased frame using the following relation. 

Eq =E1 ELpo 
ELPI 

where ELpO is the energy of the LP filter impulse response of the last good frame before the erasure and ELpl is the 
energy of the LIP filter of the first good frame after the erasure. In this implementation, the LIP filters of the last 
subframes in a frame are used. Finally, the value of Eq is... 



Claims: 

...defined in claim 13, comprising estimating the spectral tilt parameter as a ratio between an energy concentrated in 
low frequencies and an energy concentrated in high frequencies. 

1 7 A method as defined in claim 13, comprising estimating the.. .a non erased unvoiced frame after frame erasure, 
generating no periodic part of a LP filter excitation signaI;following receiving, after frame erasure, of a non erased 
frame other than unvoiced, constructing a periodic part of the LP filter excitation signal by repeating a last pitch period 
of a previous frame. 

27 A method as defined in claim 26, wherein constructing the periodic 

part of the LP filter excitation signal comprises filtering the repeated last pitch period of the previous frame through a 
low-pass filter. 

28 A method as defined in claim 27, wherein: 

determining concealment/recovery parameters comprises computing avoicing information parameter;the low-pass filter 

has a cut-off frequency; andconstructing the periodic part of the excitation signal comprises and decoder recovery 

comprises randomly generating a nonl 5 periodic, innovation part of a LIP filter excitation signal. 

30 A method as defined in claim 29, wherein randomly generating the 

non-periodic, innovation part of the LP filter excitation signal comprises generating a random noise. 

3 1 A method as defined in claim 29, wherein randomly generating the 

non-periodic, innovation part of the LP filter excitation signal comprises randomly generating vector indexes of an 
innovation codebook. 

32 A method as transition, voiced, or onset; and 

randomly generating the non-periodic, innovation part of the LIP filterexcitation signal further comprises:if the last 
correctly received frame is different from unvoiced,filtering the innovation part of the excitation signal through a high 

passfilter; andif the last correctly received frame is unvoiced, using only theinnovation part of. lost onset by 

constructing a periodic part of an excitation signal as a low-pass filtered periodic train of pulses separated by a pitch 
period. 

34 A method as defined in and 

conducting frame erasure concealment and decoder recovery comprises, when a gain of a LP filter of a first non erased 
frame received following frame erasure is higher than'a gain of a LP filter of a last frame erased during said frame 
erasure, adjusting the energy of an LP filter excitation signal produced in the decoder during the received first non 
erased frame to a gain of the LP 

filter of said received first non erased frame. 
40 A method as defined in claim 39 wherein: 

adjust ing the energy of an LP filter excitation signal produced in thedecoder during the received first non erased frame 
to a gain of the LP filter of said received first non erased frame comprises using the following relation:. 1=1 ELPO... 



60 



...of the current frame, ELpo is the energy of an impulse response of the LIP filter to the last non erased frame received 
before the frame erasure, and ELpl is the energy of the impulse response of the LIP filter to I 0 the received first non 
erased frame following frame erasure. 

41 A method a 'non erased unvoiced frame after frame erasure, 

generating no periodic part of a LP filter excitation signal;following receiving, after frame erasure, of a non erased 
frame other than unvoiced, constructing a periodic part of the LP filter excitation signal by repeating a last pitch period 
of a previous frame. 

48 A method as defined in claim 47, wherein constructing the periodic 

part of the excitation signal comprises filtering the repeated last pitch period of the previous frame through a low-pass 
filter. 

49 A method as defined in claim 48, wherein: 

determining, in the decoder, conceal ment/recovery parameters comprisescomputing a voicing informafion 
parameter;the low-pass filter has a cut-off frequency; andconstructing the periodic part of the LIP filter excitation 
signal comprises dynamically adjusting the cut-off frequency in relation to the voicing information... 
...erasureconcealment and decoder recovery comprises randomly generating a nonperiodic, innovafion part of a LIP 
filter excitafion signal. 

51 A method as defined in claim 50, wherein randomly generating the 

non-periodic, innovation part of the LIP filter excitation signal comprises generafing a random noise. 

52 A method as defined in claim 50, wherein randomly generating the 

non-periodic, innovation part of the LP filter excitation signal comprises randomly ,generafing vector indexes of an 
innovation codebook. 

53 A method as transition, voiced, or onset; and 

randomly generating the non-periodic, innovation part of the LIP filterexcitation signal further comprises:if the last 
received non erased frame is different from unvoiced,filtering the innovation part of the LP filter excitafion signal 
through a5 high pass filter; andif the last received non erased frame is unvoiced, using only theinnovafion part of the 
LP filter excitation signal. 

54 A method as defined in claim 50, wherein: 

the sound signal is lost onset by constructing a periodic part of an excitafion signal as a low-pass filtered periodic 

train of pulses separated by a pitch period. 

55 A method as defined in frame erasure 

concealment and decoder recovery further comprises constructing an innovation part of the LP filter excitation signal 
by means of normal decoding. 

56 A method as defined in claim 55, wherein constructing an innovation 

part of the LP filter excitation signal comprises randomly choosing entries of an innovation codebook. 

57 A method as defined and 

conducting frame erasure concealment and decoder recovery comprises, when a gain of a LIP filter of a first non erased 
frame received following frame erasure is higher than a gain of a LP filter of a last frame erased during said frame 
erasure, adjusting the energy of an LIP filter excitation signal produced in the decoder during the received first non 
erased frame to a gain of the LIP filter ofsaid received first non erased frame using the following relationiEq =E1 

ELpOELPI of the current frame, ELpO is the energy of an impulse response of the LIP filter to the last non erased 

frame received before the frame erasure, and ELpl is the energy of the impulse response of the LP filter to the received 
first non erased frame following frame erasure. 

60 A device for improving claim 72, comprising means for esfimafing the 

spectral tilt parameter as a ratio between - an energy concentrated in low frequencies and an energy concentrated in 
high frequencies. 

76 A device as defined in claim 72, comprising means for erased unvoiced frame after frame erasure, 

means for generating no periodic part of a LP filter excitation signal ;following receiving, after frame erasure, of a non 
erased frame other than unvoiced, means for constructing a periodic part of the LP filter excitation signal by repeafing 

a last pitch period of a previous frame.IO 86 defined in claim 85, wherein the means for construcfing the periodic 

part of the LP filter excitation signal comprises a low-pass filter for filtering the repeated last pitch period of the 
previous frame. 

87 A device as defined in determining concealment/recovery parameters comprises 

means for computing a voicing informafion parameter;the low-pass filter has a cut-off frequency; andthe means for 

constructing the periodic part of the decoder recovery comprises means for randomly generating a non-periodic, 

innovation part of a LP filter excitation signal. 

89 A device as defined in claim 88, wherein the means for randomly 

generating the non-periodic, innovation part of the LP filter excitafion signal comprises means for generafing a random 
noise. 

90 A device as defined in 88, wherein the means for randomly 

generating the non-periodic, innovation part of the LP filter excitation signalcomprises means for randomly generafing 
vector indexes of an innovafioncodebook. 

91 A onset; and 
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the means for randomly generating the non-periodic, innovation part of theLP filter excitation signal further 
comprises:if the last correctly received frame is different from unvoiced, ahigh-pass filter for filtering the innovation 

part of the excitation signal;andif the last correctly received frame is for conducting frame erasure concealment and 

decoderrecovery comprises, when a gain of a LIP filter of a first non erased frame received following frame erasure is 
higher than a gain of a LP filter of a last frame erased during said frame erasure, means for adjusting the energy of an 
LIP filter excitation signal produced in the decoder during the received first non erased frame to a gain of the LP filter 
of said received first non erased frame. 
99 A device as defined in claim 98, wherein: 

the means for adjusting the energy of an LIP filter excitation signaiproduced in the decoder during the received first 
non erased frame to a gain of 5 the LP filter of said received first non erased frame comprises means for usingthe 

following relation:Eq of the current frame, ELpO is the energy of an impulse response of the LP filter to the last 

non erased frame received before the frame erasure, and ELpl is the energy of the impulse response of the LIP filter to 
the received first non erased frame following frame erasure. 1 00. A device as.. .erased unvoiced frame after frame 
erasure,means for generating no periodic part of a LP filter excitation signal;following receiving, after frame erasure, 
of a non erased frame other than unvoiced, means for constructing a periodic part of the LP filter excitation signal by 

repeating a last pitch period of a previous frame. 107. A device the means for constructing the periodic part of the 

excitation signal comprises a low-pass filter for filtering the repeated last pitch period of the previous frame. 108. A 

device as defined in decoder, concealment/recoveryparameters comprises means for computing a voicing 

information parameter;the low-pass filter has a cut-off frequency;.andthe means for constructing the periodic part of 
the LIP filter excitation signal comprises means for dynamically adjusting the cut-off frequency in relation to the... 
...decoder recovery comprises means for randomly generating a non-periodic, innovation part of a LIP filter excitation 
signal. 1 1 0. A device as defined in claim 109, wherein the means for random lygenerating the non-periodic, innovation 

part of the LP filter excitation signal 1 5 comprises means for generating a random noise. 1 1 1. A 109, wherein the 

means for randomlygenerating the non-periodic, innovation- part of the LIP filter excitation signaicomprises means for 

randomly generating vector indexes of an innovationcodebook. 112. A onset;andthe means for randomly generating 

the non-periodic, innovation part of theLIP filter excitation signal further comprises:if the last received non erased 
frame is different from unvoiced,a high-pass filter for filtering the innovation part of the LP filterexcitation signal; 
andif the last received non erased frame is unvoiced, means forusing only the innovation part of the LP filter excitation 

signal. I 1 3. A device as defined in claim 109, wherein;the sound lost onset by constructing a periodic part of an 

excitation signal as a low-pass filtered periodic train of pulses separated by a pitch period. 1 14. A device as defined in... 
...concealment and decoder recovery further comprises means for constructing an innovation part of the LP filter 
excitation signal by means of normal decoding. 1 15. A device as defined in claim 1 14, wherein the means for 
constructing an innovation part of the LP filter excitation signal comprises means for randomly choosing entries of an 

innovation codebook. 1 16. A device for conducting frame erasure concealment and decoderrecovery comprises, 

when a gain of a LP filter of a first non erased frame received following frame erasure is higher than a gain of a LP 
filter of a last frame erased during said frame erasure, means for adjusting the energy of an LIP filter excitation signal 
produced in the decoder during the received first non erased frame to a gain of the LP filter of said received first non 

erased frame using thefollowing relation:Eq =E1 ELpOELPl of the current frame, ELpO is the energy of an impulse 

response of the LP filter to. the last non erased frame received before the frame erasure, and ELpI is the energy of the 
impulse response of the LP filter 

to the received first non erased frame following frame erasure. 1 1 9. A system for... 
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...in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view of 

producing a synthesized wideband signal. In this periodicity enhancing device and method is responsive to the 

adaptive and innovative codevectors for calculating a periodicity factor. An innovation filter subsequently processes 
the innovative codevector in relation to this periodicity factor to reduce energy of a low frequency portion of the 
innovative codevector and enhance periodicity of a low frequency portion of the excitation signal. As an example, the 
innovation filter has a transfer function of the form: i(F)(z)=-alpha(i(z))+l -alpha... 

Detailed Description: 

...to a method and device for 

enhancing periodicity of the excitation of a signal synthesis filter in view of producing a synthesized wideband signal. 

2. Brief description of the prior art and wireless applications, as well as 

Internet and packet network applications. Until recently, telephone 
bandwidths filtered in the range 200-3400 Hz were mainly used in speech 

coding applications. However, there number (corresponding^o 

10-30 ms of speech). In CELP, a linear prediction (LP) synthesis filter is 
computed and transmitted every frame. The L-sample frame is then 

divided into smaller signal is transmitted and used at the decoder 

as the input of the LP synthesis filter in order to obtain the synthesized speech. 

An innovative codebook in the CELP context, is synthesize speech according to the CELP technique, each 

block of N samples is synthesized by filtering an appropriate codevector from a codebook through time varying filters 

modeling the spectral characteristics of the speech signal. At the encoder end, the synthesis output perceptually 

weighted distortion 

measure. This perceptual weighting is performed using a so-called 

perceptual weighting filter, which is usually derived from the LIP synthesis 1 5 filter. 

The CELP model has been very successful in encoding 

telephone band sound signals, and several improves the 

quality in case of voiced segments. This was done in the past by filtering the 
innovative codevector from the fixed codebook through a filter having a 

transfer function of the form 1 /(I -ebz-T) where e is a invention is to propose a new alternative 

approach by which periodicity enhancement is achieved through filtering the innovative codevector by an innovation 
filter which reduces the lowl 5 frequency contents of the innovative codevector, whereby the innovative contribution... 
...in relation to a pitch codevector and an innovative codevector for supplying a signal synthesis filter in view 
synthesizing a wideband signal. In 

this periodicity enhancing method, a periodicity factor related to the 

wideband signal is calculated. Then, the innovative codevector is filtered in relation to the periodicity factor to thereby 
reduce energy of a low frequency 
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portion of the innovative codevector and enhance periodicity of a low 

frequency portion of the excitation signal produced in relation to adaptive and innovative codevectors for supplying 

a signal synthesis filter in view of synthesizing a wideband signal, comprises. 

a) a factor generator for calculating a periodicity factor related to 
0 said wideband signal; and 

b) an innovative filter for filtering the innovative codevector in 

relation to the periodicity factor to thereby reduce energy of a low frequency 
portion of the innovative codevector and enhance periodicity of a low 
frequency portion of the excitation signal. 

5 

According to a first preferred embodiment. 

- the innovative codevector is filtered with a transfer function of the form. 
F(z) = -az + 1 -az 

where a is of the innovative codevector. 

According to a second preferred embodiment. 

- the the innovative codevector is filtered with a transfer function of the form. 
F(z)-1 -Oz-1 

where a is from this encoded wideband signal at least pitch 

codebook parameters, innovative codebook parameters, and synthesis filter 
coefficients; 

b) an pitch codebook responsive to the pitch codebook 

parameters for producing a pitch factor generator for calculating a periodicity factor related to the 

wideband signal; and the innovation filter for filtering the innovative 

codevector in relation to the periodicity factor; 

e) a combiner circuit for combining the pitch codevector and the 

innovative codevector filtered by the innovation filter to thereby produce a 

0 periodicity-enhanced excitation signal; and 

0 a signal synthesis filter for filtering that periodicity-enhanced 

excitation signal in relation to the synthesis filter coefficients to thereby produce the synthesized wideband signal. 

5 According to the present invention, in from this encoded 

wideband signal at least pitch codebook parameters, innovative codebook 
parameters, and synthesis filter coefficients; an pitch codebook responsive 

to the pitch codebook parameters for producing a pitch codevector codevector and the innovative codevector to 

thereby produce an 

excitation signal; and a signal synthesis filter for filtering that excitation signal 
in relation to the synthesis filter coefficients to thereby produce the 
synthesized wideband signal; 

the improvement therein comprising a periodicity enhancing factor generator for calculating a periodicity factor 

related to the wideband signal; and the innovation filter for filtering the innovative codevector in relation to the 
periodicity factor before supplying this innovative codevector to.. .and below such as Code-Excited Linear Prediction 
(CELP) 

encoders typically use a LP synthesis filter to model the short-term spectral 1 5 envelope of the voice signal. The LP... 
...signal in the frame are computed, encoded, and 

transmitted. LP parameters representing the LP synthesis filter are usually computed once every frame. The frame is 

further divided into smaller blocks of pre 

processing, and preemphasis); 
s. Weighted speech vector; 

so Zero-input response of weighted synthesis filter; 
sp Down-sampled pre-processed signal; 
Oversampled synthesized speech signal; 

s' Synthesis signal before deemphasis x Target vector for pitch search; 

X* Target vector for innovation search; 

h Weighted synthesis filter impulse response; 

vT Adaptive (pitch) codebook vector at delay T; 
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yT Filtered pitch codebook vector (VTconvolved with h); 

ck Innovative codevector at index k (k-th entry... codebook index); 

b Pitch gain (or pitch codebook gain); 

i Index of the low-pass filter used on the pitch codevector; 

k Codevector index (innovation codebook entry); and 

9 Innovation codebook optional pre-processing block 

102. Pre-processing block 102 may consist of a high-pass filter with a 50 Hz 
cut-off frequency. High-pass filter 102 removes the unwanted sound 
components below 50 Hz. 

The down-sampled pre-processed signal at a sampling frequency of 

12.8 kHz). In a preferred embodiment of the preemphasis filter 103, the 

signal sp(n) is preemphasized using a filter having the following transfer function. 

P(Z) PZ 
1 0 

where u is a preemphasis value located between 0 and 1 (a 

typical value isu = 0.7). A higher-order filter could also be used. It should be pointed out that high-pass filter 102 and 
preemphasis filter 103 can be interchanged to obtain more efficient fixed-point implementations. 

1 5 

The function of the preemphasis filter 103 is to enhance the high 

frequency contents of the input signal. It also reduces quality. This will be explained in more detail herein below. 

The output of the preemphasis filter 103 is denoted s(n). This signal 

is used for performing LIP analysis in calculator are 

computed from the windowed signal, and Levinson-Durbin recursion is used 
to compute LP filter coefficients, aj, where and where p is the LIP 

order, which is typically 16 in wideband coding. The parameters ai are the coefficients of the transfer function of the 
LIP filter, which is given by the following relation. 

P 

A(z) = 1 +Ya Z 

LIP analysis performed in calculator module 104, which also 

performs the quantization and interpolation of the LP filter coefficients. The LIP filter coefficients are first 

transformed into another equivalent domain more suitable for quantization and interpolation purposes are two 

domains in 

which quantization and interpolation can be efficiently performed. The 16 LIP filter coefficients, aj, can be quantized 

in the order of 30 to 50 bits using split or a combination thereof The purpose of the interpolation is to enable 

updating the LIP filter coefficients every subframe 

while transmitting them once every frame, which improves the encoder 

performance without increasing the bit rate. Quantization and interpolation of the LIP filter coefficients is believed to 
be otherwise well known to those of ordinary skill in the rest of the coding 

operations performed on a subframe basis. In the following description, the filter A(z) denotes the unquantized 
interpolated LP filter of the subframe, and the filter A(z) denotes the quantized interpolated LP filter of the subframe. 

Perceptual Weighting. 

In analysis-by-synthesis encoders, the optimum pitch and innovation and weighted synthesis speech. 

5 

The weighted signal sjn) is computed in a perceptual weighting filter 

105. Traditionally, the weighted signal sjn) is computed by a weighting filter having a transfer function IM(z) in the 
form. 

W(Z)=A(Zlyd / A(Z of the perceptual weighting filter 105. This result is well described 

by B.S. Atal and M.R. Schroeder in of weighting is controlled by the factors y, and Y2 

The above traditional perceptual weighting filter 105 works well with 
0 telephone band signals. However, it was found that this traditional 
perceptual weighting filter 105 is not suitable for efficient perceptual 
weighting of wideband signals. It was also found that the traditional 
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perceptual weighting filter 105 has inherent limitations in modelling the formant structure and the required spectral tilt 
concurrently range 

between low and high frequencies. The prior art has suggested to add a tilt filter into W(z) in order to control the tilt 

and formant weighting of the wideband solution to this problem is, in accordance with the present 

invention, to introduce the preemphasis filter 103 at the input, compute the LP filter A(z) based on the preemphasized 
speech s(n) , and use a modified filter W(z) by fixing its denominator. 

LP analysis is performed in module 104 on the preemphasized signal 
s(n) to obtain the LP filter A(z). Also, a new perceptual weighting filter 105 
with fixed denominator is used. An example of transfer function for the 
perceptual weighting filter 104 is given by the following relation. 

W(Z) '4 (Zlyd / (I -Y2Z where O z) is computed based on the preemphasized 

speech signal s(n), the tilt of the filter IIA(zl*yl) is less pronounced compared 

to the case when A(z) is computed based on the original speech. Since 

deemphasis is performed at the decoder end using a filter having the transfer function. 

1 5 

P=l 1 -fLz 

the quantization error spectrum is shaped by a filter having a transfer 

function Wl(z)P-l(z). When Y2'S set equal to which is typically the case, the spectrum of the quantization error is 

shaped by a filter whose transfer 

function is 1 1 A(zlyl), with A(z) computed based on the preemphasized 
speech this structure for achieving 

the error shaping by a combination of preemphasis and modified weighting 

filtering is very efficient for encoding wideband signals, in addition to the advantages of ease of computed. This is 

usually done by subtracting the zero-input response so of weighted synthesis filter W(z)IA(z) from the weighted 
speech signal s'" (n). 

This zero-input response the weighted speech 

vector in the subframe, and so is the zero-input response of filter W(z)lA(z) which is the output of the combined filter 
wwlA(z) due to its initial states. 

The zero-input response calculator 108 is responsive to the quantized 

interpolated LP filter A(z) from the LIP analysis, quantization and interpolation calculator 104 and to the initial states 

of the weighted synthesis filter W(z)IA(z) stored in memory module \ \ 1 to calculate the zero due to the initial 

states as determined by setting the inputs equal to zero) of filter .W(z)IA(z). This operation is well known to those of 
ordinary skill in the target vector x. 

A N-dimensional impulse response vector h of the weighted 

synthesis filter w(z)IA(z) is computed in the impulse response generator 109 5 using the LIP filter coefficients A(z) 
and A(z) from module 104. Again, this operation is well known... ...the impulse response vector h and the open-loop 

pitch lag ToLas inputs. Traditionally, the pitch prediction has been represented by a pitch filter having the following 
transfer function. 

1 / (I -bZ -T) 

where b is the pitch gain.. .a new sample). For pitch lags 

T>N, the pitch codebook is equivalent to the filter structure (1 l(l-bz-T) , and 

an pitch codebook vector vAn) at pitch lag T from the past excitation until the vector is completed (this is not 

equivalent to the filter structure). 

In recent encoders, a higher pitch resolution is used which 

significantly improves the quality voiced sound segments. This is 

achieved by oversampling the past excitation signal using polyphase 

interpolation filters. In this case, the vector v4n) usually corresponds to an interpolated version of the past... 
...minimize the mean squared weighted error E between the target vector x and the scaled filtered past excitation. Error 
E being expressed as. 

0 

E=jjX-bYT112 

where yTis the filtered pitch codebook vector at pitch lag T-. 
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5 
n 

YT (n) = VT (n) * h(n 5), which significantly simplifies the search procedure. A simple procedure is used for 

updating the filtered codevector yTwithout the need to compute the convolution for every pitch lag. 

Once an optimum the search (module 107) tests the fractions around that 

optimum integer pitch lag. 

When the pitch predictor is represented by a filter of the form* 

1/(1 -bz-T), which is a valid assumption for pitch lags T>lsl, the spectrum of the pitch filter exhibits a harmonic 
structure over the entire frequency range, with 

a harmonic frequency related to to achieve efficient representation of the pitch 

contribution in voiced segments of wideband speech, the pitch prediction 

filter needs to have the flexibility of varying the amount of periodicity over the wideband spectrum of wideband 

signals is disclosed in the 

present specification, whereby several forms of low pass filters are applied to the past excitation and the low pass filter 
with higher prediction gain is selected. 

When subsample pitch resolution is used, the low pass filters can be 

incorporated into the interpolation filters used to obtain the higher pitch resolution. In this case, the third stage of the... 
...fractions around the chosen integer pitch lag are tested, is repeated for the several interpolation filters having 
different low-pass characteristics and the fraction and filter index which maximize the search criterion C are selected. 

A simpler approach is to complete three stages 

described above to determine the optimum fractional pitch lag using only one interpolation filter with a certain 
frequency response, and select the optimum low-pass filter shape at the end by applying the different predetermined 
lowpass niters to the chosen pitch codebook vector vT and select the low-pass filter which minimizes the pitch 
prediction error. This approach is discussed in detail below. 

Figure 3 illustrates a schematic block diagram vector vT corresponds to the interpolated past excitation 

signal. In this preferred embodiment, the interpolation filter (in module 301, but not shown) has a low-pass filter 
characteristic removing the frequency contents above 7000 Hz. 

5 

In a preferred embodiment, K filter characteristics are used; these 
filter characteristics could be low-pass or band-pass filter characteristics. 

Once the optimum codevector VT is determined and supplied by the pitch 
codevector generator 302, FC filtered versions of VT are computed 

respectively using K different frequency shaping filters such as 3050), where 2, K. These filtered versions are 
denoted VfP , wherej=l, 2, K. 

The different vectors VP are convolved in response h to obtain the vectors y6), where 

j=0l 1 1 Z I K. To calculate the mean squared pitch prediction error for 

each vector /), the value y6) is multiplied by the gain b by means vector x by means of a corresponding subtractor 

3080). Selector 309 selects the frequency shaping filter 3050) which minimizes the mean squared pitch 
prediction error 

eO==Jjx-b(I)ya)112j-l I2I...,K 

To calculate the mean squared pitch prediction error eO) for each value of yO, 

the value yO) is multiplied by the gain bG) is calculated in a corresponging gain calculator 3060) in association 

with the frequency shaping filter at index j, using the following relationship. 

1 5 

bU)-xlyUlljjyU)ij2 

In T, and j are chosen based on vT or 

vfD) which minimizes the mean squared pitch prediction error e. 

Referring back to Figure 1, the pitch codebook index T is encoded 
and approach, extra information is 

needed to encode the index j of the selected frequency shaping filter in 

multiplexer 1 1 2. For example, if three filters are used Y=0, 11213), then two bits are needed to represent this 
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information. The filter index information j can also be encoded jointly with the pitch gain b. 

Innovative codebook 0 

X -X -b 
YT 

where b is the pitch gain and yT is the filtered pitch codebook vector (the past excitation at delay T filtered with the 
selected low pass filter and convolved with the inpulse response h as described with reference to Figure 3). 

The gain g which minimize the mean-squared error 

between the target vector and the scaled filtered codevector 
E - 11 x'- gHck 112 

where H is a lower triangular convolution matrix derived update. 

In memory module 1 1 1 (Figure 1), the states of the weighted 

synthesis filter W(2)/A(z) are updated by filtering the excitation signal u = gck+ bvTthrough the weighted synthesis 
filter. After this filtering, the states of the fdter are memorized and used in the next subframe as initial states for 
computing the zero known to those of ordinary skill in the art can be used to update the filter states. 

DECODER SIDE 

The speech decoding device 200 of Figure 2 illustrates the various 

steps scaled codevector 9Ck at the output of the amplifier 

224 is processed through a innovation filter 205. 

Periodicity enhancement. 

1 0 The generated scaled codevector at the output of the amplifier improves the 

quality in case of voiced segments. This was done in the past by filtering 1 5 the innovation vector from the innovative 

codebook (fixed codebook) 218 through a filter in the form 1 l(l-ebrl) where e is ...which is part of the 

present invention, is disclosed whereby periodicity enhancement is 

achieved by filtering the innovative codevector Ck from the innovative 

(fixed) codebook through an innovation filter 205 (F(z)) whose frequency 

response emphasizes the higher frequencies more than lower 

frequencies. The is less than 0.5, then periodicity is low. 

Another efficient way to derive the filter F(z) coefficients used in a 

preferred embodiment, is to relate them to the amount higher 

0 frequencies are more strongly emphasized (stronger overall slope) for 
higher pitch gains. Innovation filter 205 has the effect of lowering the 

energy of the innovative codevector Ck at low excitation signal u at lower frequencies more than higher frequencies. 

5 Suggested forms for innovation filter 205 are 

(1) F(z)=l-Cjz-\ or (2) F(z)=-o:z+l-o pitch codevector VT from the pitch codebook 

201 is then processed through a low-pass filter 202 whose cut-off 

frequency is adjusted by means of the indexj from the demultiplexer as follows. 

(3-0.25(1 +rj. 

The enhanced signal cf is therefore computed by filtering the 

scaled innovative codevector gckthrough the innovation filter 205 (F(z)). 

The enhanced excitation signal Wis computed by the adder 220 
as. 

US and the 

enhanced excitation signal u' is used at the input of the LP synthesis filter 206. 
Synthesis and deemphasis 

The synthesized signal s' is computed by filtering the enhanced 

excitation signal Wthrough the LP synthesis filter 206 which has the form 1 1 A(z), where A(z) is the interpolated LP 
filter in the current subframe. As 5 can be seen in Figure 2, the quantized LP coefficients A(z) on line 225 from 
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demultiplexer 21 7 are supplied to the LP synthesis filter 206.to adjust the parameters of the LP synthesis filter 206 
accordingly. The deemphasis filter 207 is the inverse of the preemphasis filter 103 of Figure L The transfer 
function of the deemphasis filter 207 is given by 
D(z)-l/(l-pz 

where y is a preemphasis factor ...located between 0 and 1 (a 
typical value is y = 0.7). A higher-order filter could also be used. 

The vector s'is filtered through the deemphasis filter D(z) (module 

207) to obtain the vector s., which is passed through the high-pass filter 208 to remove the unwanted frequencies below 
50 Hz and further obtain sh. 

Oversampling and then converted to the speech domain, preferably by 

shaping it with the same LIP synthesis filter used for synthesizing the downsampled signal (section) . 

The high frequency generation procedure in accordance with using the spectral shaper 215. In the preferred 

embodiment, this is 1 0 achieved by filtering the noise wg through a bandwidth expanded version of the same LIP 
synthesis filter used in the down-sampled domain (1 1 A(zl0.8)). 

The corresponding bandwidth expanded LIP filter coefficients are calculated in spectral shaper 215. 

I 5 The filtered scaled noise sequence wf is then band-pass filtered to the required frequency range to be restored 
using the band-pass filter 216. In 

the preferred embodiment, the band-pass filter 216 restricts the noise 
sequence to the frequency range 5 7.2 kHz. The resulting band-pass 
filtered noise sequence z is added in adder 221 to the oversampled 
synthesized speech signal (section... 



Claims: 

...in 

relation to a pitch codevector and an innovative codevector for supplying asignal synthesis filter in view of 
synthesizing a wideband signal, saidperiodicity enhancing device comprisingra) a factor generator for calculating a 
periodicity factor related tothe wideband signal; andb) an innovation filter for filtering the innovative codevector 
inrelation to said periodicity factor to thereby reduce energy of a low frequency 1 0 portion of the innovative 

codevector and enhance periodicity of a low frequency portion innovative codevector. 

3 A periodicity enhancing device as defined in claim 1 , wherein said 

innovation filter has a transfer function of the form:F(z) = -az + I -az -1 where a innovative codevector. 

7 A periodicity enhancing device as defined in claim 1 , wherein said 

innovation filter has a transfer function of the form:F(z) = 1 -(Jzwhere a is a to a pitch codevector and an 

innovative codevector for supplying a 1 5 signal synthesis filter in view of synthesizing a wideband signal, 
saidperiodicity enhancing method comprising:a) calculating a periodicity factor related to the wideband signal;andb) 
filtering the innovative codevector in relation to said periodicityfactor to thereby reduce energy of a low frequency 

portion of the innovativecodevector and enhance periodicity of a low frequency portion of the codevector. 

13 A method for enhancing periodicity as defined in claim 1 0, wherein said filtering comprises processing the 
innovation vector through an innovation 

filter having a transfer function of the form:F(z) = -az + 1 -az -10 where 5 

1 7 A method for enhancing periodicity as defined in claim 1 1, wherein said filtering comprises processing the 
innovation vector through an innovation 

filter having a transfer fxinction of the form:F(Z) = 1 -CZ -1 where a is.. .from said encoded wideband signal at least 
pitchcodebook parameters, innovative codebook parameters, and synthesis filtercoefficients;b) an pitch codebook 

responsive to said pitch codebookparameters for producing a pitch factor generator for calculating a periodicity 

factor related to thewideband signal, and said innovation filter for filtering the innovative 1 5 codevector;e) a combiner 
circuit for combining said pitch codevector and saidinnovative codevector filtered by said innovation filter to thereby 
producesaid periodicity enhanced excitation signal; andf) a signal synthesis filter for filtering said periodicity 
enhancedexcitation signal in relation to said synthesis filter coefficients to thereby produce said synthesized wideband 
signal. 

22 A decoder for producing a synthesized decoder for producing a synthesized wideband signal as defined in 

claim 21, wherein said innovation filter has a transfer funcfion of the form:F(z) = -az + 1 -azwhere a is decoder for 

producing a synthesized wideband signal as defined inclaim 2 1 , wherein said innovation filter has a transfer function of 

the form:l 5F(z) = I -Ozwhere a from said encoded wideband signal at least pitchcodebook parameters, innovative 

codebook parameters, and synthesis filter coefficients;b) an pitch codebook responsive to said pitch 
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codebookparameters for producing a pitch andinnovative codevector to thereby produce an excitation signal; andO 

e) a signal synthesis filter for filtering said excitation signal inrelation to said synthesis filter coefficients to thereby 

produce saidsynthesized wideband signai;the improvement comprising of a periodicity enhancing generator for 

calculating a periodicity factor 5 related to the wideband signal, and said innovation filter for filtering the innovative 
codevector. 

32 A decoder for producing a synthesized wideband signal as defined in decoder for producing a synthesized 

wideband signal as defined in 

claim 31, wherein said innovation filter has a transfer function of the fonn:F(z) = -az + 1 -azwhere a is.. .decoder for 
producing a synthesized wideband signal as defined in claim 3 1 , wherein said innovation filter has a transfer function 

of the form:F(z) =1 -Cz -115 where innovative codevector. 

43 A cellular communication system as defined in claim 41, wherein said 

innovation filter has a transfer function of the form:F(z)=-az+l -azwhere a is innovative codevector. 

47 A cellular communication system as defined in claim 41, wherein said 

innovation filter has a transfer function of the form;F(z) = 1 -Czwhere a is a.. .53 A cellular mobile transmitter/receiver 
unit as defined in claim 51, whereinsaid innovation filter has a transfer function of the form:F(z) = -az + 1 -az -1 where 

a 57 A cellular mobile transmitter/receiver unit as defined in claim 51, whereinsaid innovation filter has a transfer 

function of the form:F(Z) = i -CZ -115 where-. codevector.O 

63 A cellular network element as defined in claim 61, wherein said 

innovation filter has a transfer function of the form:F(z) = -az + 1 -az5where a innovative codevector. 

67 A cellular network element as defined in claim 61, wherein said 

innovation filter has a transfer function of the form:F(z) =1 -Cyz -1 where c3 is.. .73 A bidirectional wireless 
communication sub-system as defined in claim71 , wherein said innovation filter has a transfer function of the 

form:F(z) = -az + 1 -az -1 where a 77 A bidirectional wireless communication sub-system as defined in claim 

7 1 wherein said innovation filter has a transfer function of the form: 
F(Z)= 1 -CZ-15 where a... 
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